Goal of Collecting this Dataset:

Our goal is to predict 10-year ASCVD risk in adults using key features such as age, gender, race, smoking status, diabetes, hypertension, and cholesterol levels. The dataset aims to facilitate accurate risk assessments and guide targeted preventive healthcare interventions.

Source of the dataset: HeartRisk

It consist of 1000 row that each have 10 attributes.

Class label:

“Risk”; 10-year risk for ASCVD which is categorized as:

Low-risk (<5%)
Borderline risk (5% to 7.4%)
Intermediate risk (7.5% to 19.9%)
High risk (≥20%)

Type of attributes:

##         Attribute_Name          Description         Data_Type
## 1               isMale               Gender            Binary
## 2              isBlack                 Race            Binary
## 3             isSmoker       Smoking Status            Binary
## 4           isDiabetic      Diabetes Status            Binary
## 5       isHypertensive  Hypertension Status            Binary
## 6                  Age Age of the candidate Numeric (Integer)
## 7             Systolic   Max Blood Pressure Numeric (Integer)
## 8          Cholesterol    Total Cholesterol Numeric (Integer)
## 9                  HDL      HDL Cholesterol Numeric (Integer)
## 10 Risk  (class label)   10-year ASCVD Risk Numeric (Decimal)
##                             Possible_Values
## 1                      0 (Female), 1 (Male)
## 2                  0 (Not Black), 1 (Black)
## 3                0 (Non-smoker), 1 (Smoker)
## 4                  0 (Normal), 1 (Diabetic)
## 5                0 (Normal BP), 1 (High BP)
## 6                       Range between 40-79
## 7                      Range between 90-200
## 8                     Range between 130-200
## 9                      Range between 20-100
## 10 Low, Borderline, Intermediate, High risk

Table that shows our Dataset before any modifications:

library(caret)
## Loading required package: ggplot2
## Loading required package: lattice
dataset <- read.csv("heartRisk.csv")
print(dataset)
##      isMale isBlack isSmoker isDiabetic isHypertensive Age Systolic Cholesterol
## 1         1       1        0          1              1  49      101         181
## 2         0       0        0          1              1  69      167         155
## 3         0       1        1          1              1  50      181         147
## 4         1       1        1          1              0  42      145         166
## 5         0       0        1          0              1  66      134         199
## 6         0       0        1          0              1  52      154         174
## 7         1       0        1          0              0  40      104         187
## 8         1       0        1          1              0  75      136         189
## 9         0       0        1          0              1  42      169         179
## 10        1       0        0          1              1  65      196         187
## 11        1       0        1          1              0  72      154         179
## 12        0       1        0          0              1  57      195         188
## 13        1       0        0          0              1  43      136         131
## 14        1       1        1          0              1  61      129         156
## 15        1       0        0          0              1  54      165         197
## 16        0       0        1          1              0  67      127         137
## 17        1       1        0          0              1  73      140         159
## 18        0       1        1          0              1  40      200         181
## 19        1       1        1          1              1  64      167         189
## 20        0       1        0          1              1  76      192         130
## 21        0       0        1          1              0  63      106         145
## 22        1       1        0          0              0  74       99         167
## 23        0       0        1          1              1  45      146         153
## 24        1       1        1          0              1  59      186         170
## 25        1       1        0          1              1  60      141         148
## 26        1       1        0          1              1  68      119         177
## 27        0       1        0          1              1  60      131         135
## 28        1       1        0          0              1  54      169         152
## 29        1       0        1          0              1  64      143         181
## 30        1       1        0          0              0  68      198         194
## 31        0       0        0          1              1  56      180         147
## 32        0       0        0          0              1  46      119         195
## 33        1       0        0          0              1  57      114         187
## 34        1       1        1          0              1  59      147         133
## 35        0       1        1          0              0  70      165         135
## 36        1       1        0          1              1  56      160         145
## 37        0       0        1          1              0  73      188         172
## 38        1       1        0          0              1  52      126         172
## 39        1       0        1          1              0  67       94         189
## 40        1       1        0          0              0  60      143         188
## 41        0       1        1          1              1  75      181         154
## 42        0       0        1          0              0  61      174         163
## 43        1       0        0          0              0  59      174         197
## 44        0       0        0          0              0  47      167         154
## 45        0       1        0          0              0  42       92         184
## 46        1       1        0          1              0  67      155         199
## 47        0       1        1          0              0  57      161         198
## 48        1       0        1          1              0  40       90         160
## 49        1       0        0          0              0  41       99         169
## 50        0       0        1          1              0  77      178         138
## 51        1       1        1          1              1  45      166         164
## 52        1       1        1          1              1  68      121         132
## 53        0       1        0          1              0  76      111         153
## 54        0       1        1          1              0  51      114         162
## 55        0       0        1          0              1  76      178         157
## 56        0       0        0          0              0  52      192         148
## 57        0       1        1          1              0  68      169         147
## 58        0       0        0          1              0  47      125         146
## 59        1       0        0          1              0  58      113         185
## 60        0       1        0          0              1  78      103         139
## 61        0       1        1          1              0  41      145         159
## 62        0       0        1          0              1  61      101         177
## 63        0       0        1          1              1  44      157         130
## 64        1       1        0          1              1  79      150         178
## 65        1       1        1          0              1  45      169         148
## 66        0       1        0          1              0  75      138         149
## 67        1       1        0          0              1  41      182         179
## 68        0       1        1          0              1  40      148         194
## 69        1       1        1          1              0  75      106         142
## 70        1       1        0          0              0  75      107         130
## 71        0       0        1          1              0  58      102         200
## 72        1       1        1          1              1  47      129         189
## 73        0       0        1          1              1  70      186         179
## 74        0       1        1          1              1  69      123         130
## 75        0       1        0          0              0  57      112         199
## 76        0       0        0          1              1  69      130         189
## 77        1       0        0          0              0  58       91         187
## 78        1       1        0          0              0  40      125         183
## 79        1       1        1          0              1  53      187         175
## 80        0       1        1          1              0  71      150         176
## 81        0       1        1          0              1  78      185         137
## 82        0       1        1          1              0  59      190         146
## 83        0       1        0          0              1  67       99         167
## 84        0       1        0          0              1  78      138         193
## 85        1       0        0          0              1  62      134         175
## 86        1       0        0          1              0  73      102         137
## 87        1       1        0          0              0  55      176         138
## 88        1       1        0          1              0  46      101         187
## 89        0       1        1          0              0  51      136         168
## 90        1       0        0          1              1  49      191         131
## 91        0       1        1          0              0  52       95         145
## 92        0       1        1          1              0  51      184         153
## 93        0       1        0          1              0  74       99         187
## 94        1       0        1          0              1  73      127         142
## 95        1       1        1          0              1  58      183         154
## 96        0       0        1          0              1  45      100         168
## 97        0       1        0          0              1  66      152         162
## 98        1       1        0          1              1  72      169         164
## 99        0       0        1          0              0  58      100         196
## 100       1       0        1          0              1  79      188         135
## 101       0       0        0          0              1  60      167         146
## 102       1       0        1          0              0  76      118         154
## 103       0       1        0          0              1  49      102         157
## 104       1       1        1          0              1  64      124         134
## 105       0       1        1          0              0  42      142         156
## 106       1       0        0          1              0  71      193         130
## 107       1       0        0          0              1  51      189         136
## 108       0       0        0          1              0  50       95         174
## 109       0       0        0          1              0  72      106         157
## 110       1       0        1          1              0  51      100         131
## 111       0       0        1          0              0  55       95         136
## 112       0       0        1          1              0  71      157         182
## 113       0       1        1          1              0  44      175         151
## 114       1       0        1          0              0  75      159         150
## 115       0       0        0          1              1  49       95         141
## 116       0       0        0          0              0  72      135         131
## 117       0       0        0          0              0  47      106         197
## 118       1       1        0          0              1  46      126         138
## 119       0       1        1          1              0  75      196         185
## 120       0       0        1          1              1  69      186         154
## 121       1       1        1          0              0  70       94         187
## 122       1       1        0          1              0  42      192         169
## 123       0       1        1          0              0  70      164         177
## 124       1       1        0          0              1  49      112         147
## 125       1       0        0          1              0  64      108         149
## 126       1       1        1          0              0  44      178         161
## 127       0       1        0          0              0  51      188         158
## 128       1       1        0          0              0  49      119         154
## 129       0       0        1          1              1  57      110         173
## 130       0       1        1          1              1  69      104         157
## 131       0       0        0          1              0  56      114         177
## 132       0       0        0          0              1  55      163         148
## 133       0       1        1          0              0  76      102         191
## 134       1       1        1          1              1  73      128         131
## 135       0       1        0          0              1  57      161         193
## 136       0       1        0          1              0  63      170         194
## 137       1       0        1          1              0  54      184         138
## 138       0       0        1          0              1  40      173         155
## 139       1       1        0          0              1  47       94         153
## 140       0       0        1          1              0  51      161         163
## 141       0       1        0          1              0  48      187         170
## 142       1       0        0          1              1  49      144         150
## 143       0       0        1          1              0  76      117         136
## 144       1       1        1          1              1  69      104         193
## 145       1       1        0          1              1  53      151         170
## 146       1       0        0          0              0  63      121         178
## 147       1       0        1          1              1  50      191         155
## 148       1       1        0          1              0  76      193         190
## 149       0       0        0          0              0  67      143         197
## 150       1       0        1          0              0  42      119         171
## 151       1       0        1          1              1  70      118         137
## 152       0       1        1          1              1  48      189         173
## 153       1       1        1          0              0  46      109         173
## 154       1       1        0          0              0  60      115         178
## 155       1       1        1          0              0  46      116         180
## 156       1       0        1          1              1  62      144         136
## 157       1       1        0          1              0  67      168         132
## 158       0       1        0          0              1  50      144         186
## 159       0       0        0          1              1  70      168         165
## 160       0       0        1          0              0  50      154         131
## 161       1       1        0          0              1  71      161         179
## 162       1       1        1          1              0  50      189         191
## 163       0       0        1          0              1  67      197         192
## 164       0       1        1          1              1  54      122         135
## 165       1       1        1          0              0  47      190         193
## 166       1       0        1          1              1  40      151         170
## 167       0       0        0          1              1  43      125         174
## 168       1       0        0          0              1  44      169         183
## 169       0       0        1          1              0  43      146         171
## 170       0       1        1          1              0  74      124         164
## 171       1       1        0          1              1  77      131         198
## 172       0       0        1          1              0  61       92         164
## 173       0       1        0          1              0  49      187         194
## 174       1       0        1          0              1  69      164         173
## 175       0       0        1          0              1  64      178         155
## 176       1       0        1          1              0  40      104         142
## 177       1       1        1          0              0  43      129         196
## 178       1       1        0          1              1  62      160         166
## 179       0       1        0          0              1  62      117         133
## 180       1       0        0          1              0  48      151         142
## 181       1       1        0          0              0  71      105         192
## 182       0       0        0          0              1  46       96         136
## 183       1       1        0          1              0  74      142         195
## 184       0       1        0          0              0  51      149         155
## 185       0       1        1          0              0  60      121         172
## 186       1       1        0          0              1  72      158         183
## 187       0       1        0          1              1  44      178         193
## 188       0       0        0          0              1  74      195         140
## 189       0       0        1          0              1  43      146         178
## 190       0       0        1          1              1  61      182         140
## 191       0       0        0          1              1  75      128         152
## 192       1       1        0          0              0  62      192         135
## 193       1       1        0          1              0  45      145         154
## 194       1       0        0          1              0  71      174         176
## 195       0       1        0          0              0  58      160         144
## 196       1       1        1          0              0  77      123         182
## 197       0       0        1          0              1  56      193         138
## 198       1       1        0          1              1  56      129         154
## 199       1       0        0          1              1  74      175         146
## 200       1       0        0          0              0  41       91         156
## 201       1       1        0          1              1  77      197         130
## 202       1       1        0          1              0  69      140         150
## 203       0       1        0          0              1  72      125         164
## 204       1       1        0          0              1  61      120         173
## 205       1       0        1          1              1  79      128         160
## 206       1       1        1          1              1  57      120         141
## 207       0       1        0          1              0  66      106         187
## 208       1       1        1          0              1  69      161         175
## 209       0       1        1          1              0  70      132         149
## 210       1       0        0          0              0  61      170         164
## 211       1       0        1          1              0  58      127         184
## 212       0       0        1          0              1  67      186         154
## 213       1       0        1          1              1  73      187         167
## 214       1       1        1          1              0  40       93         161
## 215       1       1        1          0              0  68      155         131
## 216       1       1        0          0              0  41      126         184
## 217       0       0        0          0              1  71      196         186
## 218       1       1        1          1              1  56      174         187
## 219       0       0        0          1              0  57      179         185
## 220       1       1        1          0              1  46      200         195
## 221       0       1        0          1              1  78      157         188
## 222       0       0        1          1              0  61      189         170
## 223       0       1        1          0              1  46       90         170
## 224       1       0        1          0              1  69      124         131
## 225       0       0        1          1              1  74      181         176
## 226       0       1        1          0              0  68      118         196
## 227       0       0        0          1              0  65      175         135
## 228       1       0        0          1              1  68      154         176
## 229       1       1        1          1              1  48      150         167
## 230       1       1        1          0              0  76      141         176
## 231       0       0        0          0              0  76      181         169
## 232       0       0        0          1              1  67      135         197
## 233       0       0        0          0              1  69      148         169
## 234       1       1        0          1              1  69      188         182
## 235       0       0        0          1              0  47      182         193
## 236       0       0        1          0              1  56      133         145
## 237       1       1        1          1              0  78      152         146
## 238       1       1        0          1              0  52      178         149
## 239       1       0        1          0              1  49      150         186
## 240       1       0        1          1              1  75      137         176
## 241       1       0        1          1              1  40      152         130
## 242       0       1        0          1              0  69      103         198
## 243       0       1        1          0              1  61      138         136
## 244       0       1        0          0              0  56       90         182
## 245       0       0        0          0              0  61      190         147
## 246       0       1        0          0              0  66      106         187
## 247       1       1        0          0              0  41      142         190
## 248       1       1        1          1              1  65      117         160
## 249       1       1        0          1              0  53       94         199
## 250       1       0        0          0              1  43      122         135
## 251       1       1        0          0              1  65      105         184
## 252       0       0        0          1              0  59      181         178
## 253       0       1        1          0              0  54      116         176
## 254       0       0        1          0              1  55      188         146
## 255       0       1        0          1              1  64      123         184
## 256       1       0        1          0              1  56      132         181
## 257       1       0        1          0              0  50      108         145
## 258       1       1        1          0              1  70      129         143
## 259       0       1        0          0              0  73       94         190
## 260       1       1        1          1              0  46      110         198
## 261       0       0        0          0              0  54       94         132
## 262       1       0        1          0              0  63      189         148
## 263       0       1        0          0              1  53      140         164
## 264       0       0        0          1              1  48      138         181
## 265       0       0        0          1              1  54      181         142
## 266       0       0        0          0              0  42      143         160
## 267       1       0        1          0              1  60      130         133
## 268       1       0        0          0              0  74       95         169
## 269       1       0        0          1              1  69      193         172
## 270       0       1        1          0              1  61      105         139
## 271       1       0        1          1              1  76      137         146
## 272       0       1        0          1              1  58      198         139
## 273       0       1        0          0              1  70      125         135
## 274       1       0        0          1              0  50      181         161
## 275       1       1        0          0              1  68       99         166
## 276       0       1        1          0              0  63       98         173
## 277       0       0        1          1              1  72      118         188
## 278       0       1        1          0              1  73      156         137
## 279       1       1        1          0              1  70      120         176
## 280       0       1        1          0              0  45      170         158
## 281       0       1        1          0              0  70      183         133
## 282       1       1        0          0              1  58      156         178
## 283       0       0        1          1              1  63      124         139
## 284       1       1        0          1              0  73      183         187
## 285       0       0        0          1              0  57      105         141
## 286       0       0        1          1              0  49      189         174
## 287       0       0        1          0              0  43      122         159
## 288       0       1        0          0              0  74      106         185
## 289       1       1        0          0              0  75      137         136
## 290       1       0        0          0              1  73      130         161
## 291       0       1        1          1              0  60      145         167
## 292       1       1        0          0              0  43      191         180
## 293       0       1        1          0              1  47      200         190
## 294       1       0        1          1              1  66      177         177
## 295       1       1        1          0              1  66      117         139
## 296       0       0        1          1              1  49       99         186
## 297       0       1        0          0              0  71      194         156
## 298       0       1        0          1              1  60      126         172
## 299       0       0        0          1              1  68      105         167
## 300       0       0        0          0              1  60      194         151
## 301       0       0        1          1              0  54      154         144
## 302       1       1        1          0              1  60      159         155
## 303       1       0        0          1              1  59      166         172
## 304       0       0        1          1              0  48       98         155
## 305       0       1        0          0              0  54      188         150
## 306       1       0        0          1              1  63      199         158
## 307       0       0        1          1              1  66      127         168
## 308       0       0        1          0              0  46      158         130
## 309       1       0        1          1              1  73      161         165
## 310       1       0        1          1              0  57      113         191
## 311       1       0        1          0              0  41       90         160
## 312       1       0        1          1              1  76      163         136
## 313       1       1        0          0              1  69      191         136
## 314       0       1        1          1              1  72      192         143
## 315       1       1        1          1              0  50      157         141
## 316       1       0        1          0              0  77      105         169
## 317       0       0        1          1              1  54      155         134
## 318       1       1        1          0              1  52      112         139
## 319       1       1        1          0              0  42      165         177
## 320       0       0        0          1              1  62      154         198
## 321       0       0        1          1              0  64      158         170
## 322       1       0        0          1              1  79      152         155
## 323       1       0        1          1              0  65      191         146
## 324       0       1        0          1              1  56      178         138
## 325       1       0        0          0              1  78      126         182
## 326       1       0        1          0              1  62      107         188
## 327       0       0        1          0              0  49      165         135
## 328       0       1        1          1              1  69      135         190
## 329       1       0        0          0              1  71      122         141
## 330       1       0        0          1              0  70      144         174
## 331       0       1        1          0              0  64      139         143
## 332       0       0        0          1              0  77      160         171
## 333       1       1        0          1              0  64      200         187
## 334       0       1        1          0              0  52      103         163
## 335       1       0        0          0              0  61      196         159
## 336       1       1        0          1              1  40       92         158
## 337       1       0        1          0              1  42      117         149
## 338       0       0        1          1              1  48      114         181
## 339       0       1        0          0              0  66      154         193
## 340       0       1        0          1              1  53       98         183
## 341       1       0        0          1              0  43      125         163
## 342       1       1        0          1              0  44      153         163
## 343       0       1        0          0              0  62      159         162
## 344       0       1        0          0              1  44      123         170
## 345       1       1        1          0              1  79      100         188
## 346       1       1        1          1              1  55      103         161
## 347       0       1        0          0              0  48      165         145
## 348       0       1        0          1              1  48      119         181
## 349       0       0        0          1              0  65      156         192
## 350       1       0        1          1              1  48       96         153
## 351       1       0        1          1              0  44      185         187
## 352       0       1        0          1              1  76      163         136
## 353       1       0        0          1              1  49      107         150
## 354       0       1        0          1              1  65      184         146
## 355       0       0        1          1              0  51      126         133
## 356       1       0        1          1              1  41       93         154
## 357       0       1        0          1              0  62      110         193
## 358       0       0        0          0              1  48      135         147
## 359       0       0        0          1              1  45      118         193
## 360       0       1        0          1              1  71      169         189
## 361       1       1        0          1              1  60      124         172
## 362       1       1        0          1              1  48      154         158
## 363       1       0        0          0              1  69       97         199
## 364       1       1        0          1              0  60      157         174
## 365       1       1        1          0              0  53      184         183
## 366       1       1        1          0              0  41      170         157
## 367       1       0        0          1              0  42      118         150
## 368       0       1        1          0              1  64      100         184
## 369       0       1        0          1              1  61      149         167
## 370       0       0        1          1              0  76      185         196
## 371       1       1        0          0              0  49      181         135
## 372       0       0        0          1              1  63       95         149
## 373       1       0        1          1              1  48      171         180
## 374       0       0        0          0              1  50      168         166
## 375       1       1        0          1              0  55      108         156
## 376       0       0        1          0              0  78      128         165
## 377       1       0        1          0              1  68      106         184
## 378       0       1        0          0              0  59      191         149
## 379       1       1        1          1              0  55      147         186
## 380       1       1        0          0              0  64      200         175
## 381       1       1        0          0              1  54      140         156
## 382       1       1        0          0              1  75      149         149
## 383       1       0        0          0              0  64      118         166
## 384       0       0        1          1              0  70      196         182
## 385       1       0        1          1              0  68      162         150
## 386       1       1        0          0              1  71      154         149
## 387       0       1        0          1              1  73      106         185
## 388       1       0        0          1              0  59      192         180
## 389       1       1        1          1              1  47      197         150
## 390       1       1        1          1              0  57      139         155
## 391       0       0        1          1              1  55      176         132
## 392       0       1        0          1              0  79       94         192
## 393       0       0        1          1              1  46      112         137
## 394       0       0        1          0              0  45      162         160
## 395       0       0        1          0              1  76      180         189
## 396       1       1        1          0              0  57      160         146
## 397       1       0        0          1              0  49      167         161
## 398       1       0        1          1              1  46      155         182
## 399       1       1        0          1              0  57       97         180
## 400       0       1        1          1              0  77       95         184
## 401       1       0        1          1              1  44      130         190
## 402       1       1        0          0              0  72      178         135
## 403       0       1        1          0              1  46      188         173
## 404       0       0        1          1              1  54      161         190
## 405       1       0        0          0              0  55      187         135
## 406       1       0        0          1              1  42      149         190
## 407       1       0        1          1              0  63      182         181
## 408       1       0        1          0              1  48      133         174
## 409       1       1        1          0              1  77      188         176
## 410       0       1        1          0              0  77      155         137
## 411       0       1        0          1              0  72      166         154
## 412       1       1        0          0              0  60       90         187
## 413       0       0        0          1              1  43      159         156
## 414       0       0        0          0              0  49      112         147
## 415       1       0        1          1              1  50      118         177
## 416       1       1        1          1              0  55      189         154
## 417       1       0        1          0              0  44      138         192
## 418       1       1        1          1              0  46      199         151
## 419       0       1        1          0              1  72      157         194
## 420       1       0        1          1              1  44      175         148
## 421       1       0        0          0              0  60      116         148
## 422       0       0        0          1              1  66      121         197
## 423       0       0        1          0              1  44      180         199
## 424       0       1        0          0              1  48      153         191
## 425       1       0        0          1              1  51      102         164
## 426       0       0        1          0              1  72      169         179
## 427       1       0        0          0              0  67      189         161
## 428       0       1        0          1              0  66      123         182
## 429       0       1        0          1              1  53      133         181
## 430       0       0        1          1              0  46      129         176
## 431       1       1        1          0              0  71      167         176
## 432       0       0        0          1              0  61      135         151
## 433       1       1        1          0              1  67      174         172
## 434       1       1        0          1              1  43      141         143
## 435       0       0        1          1              1  69      106         183
## 436       0       1        0          0              1  75      134         137
## 437       0       1        0          0              1  44      193         200
## 438       0       1        1          0              0  49      180         195
## 439       1       0        1          0              1  51      170         144
## 440       0       0        1          1              0  63      105         139
## 441       1       0        1          0              1  58      181         163
## 442       1       0        0          1              0  51      109         174
## 443       1       0        1          1              1  63      124         135
## 444       1       1        0          0              0  79      107         195
## 445       0       0        1          1              0  71      200         141
## 446       0       1        1          1              1  71      157         146
## 447       0       1        0          0              1  50       91         155
## 448       1       0        0          1              0  76      179         166
## 449       1       1        1          0              0  61      130         192
## 450       0       0        1          1              1  59      122         171
## 451       1       0        0          1              0  57       96         135
## 452       0       0        1          1              1  42      153         148
## 453       1       1        0          1              0  46      200         173
## 454       1       1        1          1              0  57      136         168
## 455       0       0        0          1              1  70      104         146
## 456       1       0        1          1              0  65      140         188
## 457       0       1        0          0              1  49      115         186
## 458       0       1        1          0              1  53       94         135
## 459       1       1        1          1              0  42      186         149
## 460       1       1        0          0              1  74      126         169
## 461       0       1        1          1              0  65      150         134
## 462       0       0        1          1              0  66      119         136
## 463       0       1        1          0              1  48      167         147
## 464       1       1        0          0              0  51      119         177
## 465       1       0        1          1              1  47      145         186
## 466       0       0        1          1              1  54       91         180
## 467       0       1        1          1              0  66      156         130
## 468       0       0        1          0              1  71      137         156
## 469       0       0        1          0              0  73      175         137
## 470       0       0        1          1              0  56      166         174
## 471       1       1        1          1              0  65      118         190
## 472       0       1        1          0              0  70      103         142
## 473       0       1        1          0              1  50      155         199
## 474       0       1        1          1              0  72       95         134
## 475       1       0        0          1              0  75      195         144
## 476       0       0        0          1              0  56      113         159
## 477       0       0        0          1              0  56      191         185
## 478       0       0        1          0              0  51      180         135
## 479       1       0        0          1              1  75      109         135
## 480       1       0        1          1              0  68      144         185
## 481       0       1        1          0              0  40      102         152
## 482       0       1        0          0              0  72       95         176
## 483       1       1        0          0              1  65      197         143
## 484       1       0        1          0              1  52      146         134
## 485       0       0        1          1              0  51      159         182
## 486       1       1        1          1              1  73      172         176
## 487       0       1        0          1              0  73      186         150
## 488       0       1        1          0              0  46      184         131
## 489       0       0        1          1              1  72      180         170
## 490       1       1        0          1              0  76       97         143
## 491       1       1        1          1              0  53      163         152
## 492       1       0        0          0              1  43      106         161
## 493       0       0        1          0              1  52      108         142
## 494       1       0        1          1              0  48      129         185
## 495       1       1        0          0              1  68      107         156
## 496       1       0        1          1              1  70      166         173
## 497       1       0        1          0              1  59      118         181
## 498       0       0        1          0              0  54      166         154
## 499       1       1        0          1              1  47      162         163
## 500       0       1        0          0              0  65      121         175
## 501       0       0        0          1              0  78      122         161
## 502       1       1        0          0              0  42       97         142
## 503       1       0        0          0              1  44      144         157
## 504       0       0        0          0              0  42      100         141
## 505       0       0        0          1              0  41       90         144
## 506       1       1        0          0              0  43      118         195
## 507       0       1        0          1              0  52      136         198
## 508       0       0        0          0              0  48       95         137
## 509       0       1        0          0              1  61      200         167
## 510       1       1        1          0              0  43      116         158
## 511       0       0        1          1              1  75      171         165
## 512       0       0        0          1              0  54      150         180
## 513       0       1        1          1              0  55      141         153
## 514       1       1        1          1              0  68      161         146
## 515       0       1        0          1              0  68      139         143
## 516       0       0        1          0              1  43      147         145
## 517       1       1        0          1              0  71      199         197
## 518       1       1        0          0              1  59      193         131
## 519       0       0        1          0              1  70      196         180
## 520       0       0        1          1              0  76      175         137
## 521       1       0        1          0              0  67      166         132
## 522       0       1        1          1              1  73      122         172
## 523       0       0        0          0              0  56      111         160
## 524       0       1        1          1              1  61       92         169
## 525       1       0        0          1              0  47       92         160
## 526       0       0        0          1              1  75      163         140
## 527       0       1        0          0              1  76      166         178
## 528       1       1        0          1              1  44      108         176
## 529       0       0        0          1              1  40      198         173
## 530       1       0        0          0              0  71      134         199
## 531       1       1        1          1              1  67      178         182
## 532       0       1        0          1              0  45      107         132
## 533       1       1        1          1              1  63      108         134
## 534       0       0        0          0              1  56      191         142
## 535       1       0        0          0              0  70      136         134
## 536       1       0        1          0              0  55      189         167
## 537       0       1        0          1              1  72       95         135
## 538       0       1        1          1              0  74      177         154
## 539       0       1        1          0              1  51      154         140
## 540       1       1        1          0              1  48      140         142
## 541       1       0        1          0              0  48      138         195
## 542       1       1        1          1              0  76      168         176
## 543       1       1        0          1              0  72      188         146
## 544       1       1        1          0              1  56      171         170
## 545       1       1        0          1              1  53      105         133
## 546       1       0        1          0              0  68      117         193
## 547       1       1        0          1              0  44      150         163
## 548       0       1        0          1              1  43      198         170
## 549       1       1        0          1              1  70      146         132
## 550       0       0        0          1              1  56      112         162
## 551       0       1        0          1              1  43      106         157
## 552       1       1        1          0              0  72      194         154
## 553       1       0        1          1              0  66      179         188
## 554       0       0        0          0              1  74      166         137
## 555       1       1        0          1              1  72      193         130
## 556       1       1        1          1              1  78      114         155
## 557       0       0        1          1              1  77      193         170
## 558       0       1        1          1              0  49      109         190
## 559       0       1        1          0              1  74      169         194
## 560       1       1        0          0              1  69      192         142
## 561       0       1        1          1              0  78      177         137
## 562       0       0        1          0              0  67      124         188
## 563       0       1        0          1              1  51      104         187
## 564       1       1        1          1              1  43      181         138
## 565       1       0        0          0              1  60      138         192
## 566       1       1        1          1              1  61      139         133
## 567       0       1        1          1              0  68      161         147
## 568       1       1        0          1              1  66      154         139
## 569       0       1        0          1              0  72      170         149
## 570       1       0        1          1              0  58      187         186
## 571       0       1        1          1              0  73      154         153
## 572       0       0        0          0              0  73      136         169
## 573       0       1        0          0              1  67      114         146
## 574       0       0        0          1              0  59      125         193
## 575       1       1        0          0              1  78      183         178
## 576       0       0        0          0              0  63      139         197
## 577       1       0        1          0              0  51      156         182
## 578       1       0        1          1              1  79      172         193
## 579       0       0        0          0              1  43      173         141
## 580       0       0        1          1              1  50      196         196
## 581       1       0        1          0              1  61      126         193
## 582       0       1        0          1              1  62      148         166
## 583       0       1        1          1              1  63      153         132
## 584       0       1        0          1              1  44      115         177
## 585       1       1        0          1              1  64       92         134
## 586       1       1        0          0              0  40      142         163
## 587       1       1        0          1              0  56      132         167
## 588       0       1        1          0              1  60      105         188
## 589       0       1        0          1              1  69      158         146
## 590       0       1        0          1              0  63      182         172
## 591       0       1        1          1              0  69      169         153
## 592       0       0        1          1              1  45      167         167
## 593       0       1        0          1              0  47       96         167
## 594       0       1        0          0              1  72      179         164
## 595       1       0        0          0              1  42      143         157
## 596       0       1        1          1              1  73      124         199
## 597       1       1        1          1              1  67      123         154
## 598       0       1        0          0              0  43      196         171
## 599       0       1        0          1              1  69      101         175
## 600       0       1        1          0              0  49      130         130
## 601       0       0        1          0              0  46      130         135
## 602       1       1        1          0              0  71      144         178
## 603       0       0        1          0              1  74      183         164
## 604       1       1        1          0              0  74      116         184
## 605       1       1        0          1              0  61      152         172
## 606       0       0        1          1              1  42      184         148
## 607       0       0        1          1              0  66      163         147
## 608       0       0        1          1              1  50      149         182
## 609       1       1        1          1              1  49      197         158
## 610       1       1        0          1              1  49      106         196
## 611       0       0        1          0              0  47      196         171
## 612       0       1        0          1              1  68      134         137
## 613       0       0        1          0              0  74      124         196
## 614       0       1        1          1              1  54      145         140
## 615       1       1        1          0              0  61      192         135
## 616       1       0        0          0              1  70      162         163
## 617       0       1        1          1              1  40      190         164
## 618       1       0        0          0              0  66      125         157
## 619       0       0        1          1              1  55      178         149
## 620       0       1        1          0              0  70      185         149
## 621       0       1        1          1              1  44      117         169
## 622       1       1        0          1              0  69      130         183
## 623       0       1        1          1              1  64      115         182
## 624       0       0        0          1              0  56      166         188
## 625       1       1        0          1              0  50      153         147
## 626       0       1        1          0              0  49      166         188
## 627       1       0        1          1              0  62      177         144
## 628       1       1        0          0              1  42      158         152
## 629       0       1        0          1              1  42      174         178
## 630       0       0        0          0              1  69      175         189
## 631       1       0        1          1              1  79      128         195
## 632       1       0        0          0              1  40      138         172
## 633       0       0        0          1              1  59      160         155
## 634       0       0        1          0              0  49      186         137
## 635       0       1        0          0              0  46      139         172
## 636       0       0        1          0              0  72      115         131
## 637       1       0        0          1              0  67      174         189
## 638       0       1        0          0              0  62      140         157
## 639       1       1        0          0              0  69      119         181
## 640       1       0        1          1              0  47      165         160
## 641       0       0        1          0              1  74      118         187
## 642       0       0        1          0              0  53      132         148
## 643       1       1        0          0              0  61      164         153
## 644       0       1        0          0              0  45      139         132
## 645       0       0        0          0              1  65      183         148
## 646       1       1        0          0              1  54      100         161
## 647       0       0        1          0              0  40      141         173
## 648       0       0        0          0              0  46      168         144
## 649       1       0        1          1              0  52      126         133
## 650       1       0        0          1              0  57      145         143
## 651       0       0        0          0              0  48      151         181
## 652       1       0        0          1              1  61      130         185
## 653       1       1        1          1              0  74      197         147
## 654       0       1        1          1              1  56      130         138
## 655       0       0        0          0              1  42      190         157
## 656       1       1        1          1              1  61      178         162
## 657       0       0        1          0              1  66      169         138
## 658       1       1        1          1              1  43      175         153
## 659       1       1        0          1              0  58      100         189
## 660       0       1        1          0              1  79      127         155
## 661       1       1        0          0              0  77      118         152
## 662       0       1        1          0              0  61      139         181
## 663       0       1        0          0              0  71      185         161
## 664       1       1        1          0              1  70      113         147
## 665       1       1        1          0              0  67      184         163
## 666       1       1        1          0              1  62      102         151
## 667       1       0        1          0              0  48      108         145
## 668       1       1        1          1              1  52      146         130
## 669       1       1        1          0              0  78       91         200
## 670       1       1        1          1              1  58      122         143
## 671       1       1        1          1              1  75      107         172
## 672       0       1        0          0              1  77      133         173
## 673       0       1        0          1              1  73      159         190
## 674       1       0        0          1              0  79      116         149
## 675       0       0        0          1              1  76      146         168
## 676       1       0        0          0              0  72      104         172
## 677       0       0        1          0              1  67      135         182
## 678       0       1        1          0              0  44      123         196
## 679       1       1        0          0              1  60      157         171
## 680       1       0        1          1              0  43      169         150
## 681       1       1        1          0              1  42      165         186
## 682       0       1        0          0              1  70      111         134
## 683       1       1        1          0              0  49      136         164
## 684       1       0        0          1              0  70      174         151
## 685       1       1        0          0              0  48      157         181
## 686       0       1        1          1              0  46      109         158
## 687       1       0        1          1              0  60      150         142
## 688       1       1        0          1              1  63      173         133
## 689       1       0        1          1              0  54      160         195
## 690       0       1        1          0              1  71      139         138
## 691       0       0        0          0              1  67      192         132
## 692       1       0        1          0              0  69      161         143
## 693       0       1        1          0              0  65      103         169
## 694       1       1        0          0              1  55      147         156
## 695       0       1        1          0              0  68      168         184
## 696       0       1        1          1              1  49      112         132
## 697       0       1        0          0              1  58      143         168
## 698       1       0        1          1              1  66      110         149
## 699       1       0        0          0              0  75      179         131
## 700       0       1        1          0              0  71      162         144
## 701       0       0        0          0              0  56      161         198
## 702       0       1        1          1              1  43      153         135
## 703       1       0        0          1              0  75      117         199
## 704       0       0        0          0              0  58      190         163
## 705       0       1        1          1              1  79      158         176
## 706       0       0        0          0              0  51      165         174
## 707       0       1        0          1              1  65      144         191
## 708       0       1        0          1              0  42      146         188
## 709       1       1        0          1              1  45      180         165
## 710       1       0        0          0              1  41      195         159
## 711       1       1        1          1              0  57       90         157
## 712       1       1        1          0              0  52      129         147
## 713       1       0        0          0              0  46      192         193
## 714       0       0        1          1              0  51      181         158
## 715       0       0        0          0              0  49      191         157
## 716       1       0        1          1              0  59      147         137
## 717       1       0        1          1              1  56      156         193
## 718       0       0        0          0              0  50      108         153
## 719       0       0        0          0              1  57      172         175
## 720       1       1        1          0              0  67      136         149
## 721       0       1        1          0              0  45      130         145
## 722       1       1        1          1              0  59      137         170
## 723       0       1        0          0              1  53      135         134
## 724       0       1        0          0              1  42      184         194
## 725       1       0        1          1              1  76      147         169
## 726       1       0        1          1              1  78      166         188
## 727       0       0        1          1              1  61      190         192
## 728       1       1        1          0              0  44      161         135
## 729       1       1        1          0              0  64      113         143
## 730       0       0        1          0              0  77      150         145
## 731       1       1        0          0              1  57      103         181
## 732       1       1        0          0              0  44      123         183
## 733       0       1        0          1              1  64      144         139
## 734       0       0        1          1              0  48      194         169
## 735       1       1        1          0              1  46      169         171
## 736       0       1        1          0              0  59      102         200
## 737       0       0        1          0              1  54      123         198
## 738       0       1        0          0              1  68      142         162
## 739       1       0        0          0              1  59      154         132
## 740       0       1        0          0              0  79       97         147
## 741       0       1        1          1              1  71      199         134
## 742       0       0        1          1              0  43       90         156
## 743       1       1        1          0              1  52      134         143
## 744       1       0        0          0              0  40      121         186
## 745       1       1        1          1              1  40      139         166
## 746       1       0        0          0              0  40      107         194
## 747       0       0        1          0              0  63      141         169
## 748       0       1        0          0              1  77      112         186
## 749       1       0        0          1              1  73       90         134
## 750       0       1        1          0              0  56      164         200
## 751       1       1        0          1              0  72      123         163
## 752       0       0        1          1              1  63      191         170
## 753       0       1        0          1              0  78       90         176
## 754       1       1        0          1              1  54       93         133
## 755       0       1        1          0              0  55      116         169
## 756       1       0        1          0              0  69      178         180
## 757       0       0        1          0              1  76      140         165
## 758       0       1        1          0              0  70      179         153
## 759       1       0        1          0              0  78      175         168
## 760       0       1        1          1              0  68      108         159
## 761       0       1        1          1              1  66      166         154
## 762       0       0        1          1              0  54      102         154
## 763       1       1        0          0              1  67      156         133
## 764       0       0        0          1              0  53      119         182
## 765       0       0        1          1              0  50      200         166
## 766       0       0        1          1              1  77      115         181
## 767       1       1        0          1              0  78      101         168
## 768       1       1        1          0              1  59      108         168
## 769       1       1        0          0              0  79      146         163
## 770       0       0        0          0              0  78      154         166
## 771       0       0        0          0              0  65      144         194
## 772       0       0        1          0              1  79      128         130
## 773       0       1        0          0              0  63      161         135
## 774       0       1        1          1              0  65      121         146
## 775       1       1        0          0              0  47      142         153
## 776       1       1        1          0              1  49      118         192
## 777       0       1        0          1              1  50      153         187
## 778       0       0        0          1              0  40      198         190
## 779       0       1        1          1              0  61      120         184
## 780       1       0        0          1              1  42       98         166
## 781       0       1        0          1              1  64      103         169
## 782       1       1        1          1              0  47      112         189
## 783       1       0        0          0              0  69      113         197
## 784       1       1        0          1              0  43      169         191
## 785       1       0        0          0              0  77      118         192
## 786       0       0        1          0              0  58      163         187
## 787       0       0        0          1              0  72      121         147
## 788       0       1        1          0              0  47      129         154
## 789       0       0        1          1              1  76       94         191
## 790       1       0        1          1              1  70       99         150
## 791       1       1        1          1              0  56      136         189
## 792       1       1        0          1              1  46      164         160
## 793       0       1        1          1              0  52      159         180
## 794       0       1        0          1              0  65      111         172
## 795       0       0        0          0              0  66       96         190
## 796       0       0        0          1              0  42      100         144
## 797       1       0        1          0              0  43      188         175
## 798       1       0        1          1              0  71      155         181
## 799       1       1        0          1              0  60      153         162
## 800       1       1        1          1              1  43      133         158
## 801       0       0        1          1              0  79      159         194
## 802       1       1        0          1              1  40      124         188
## 803       1       1        1          1              1  40      171         181
## 804       0       1        0          0              0  46      116         141
## 805       0       1        0          0              0  40      146         137
## 806       1       0        1          1              1  50      176         136
## 807       0       0        1          1              1  75      107         190
## 808       1       1        0          1              0  62      112         177
## 809       0       1        1          1              0  46       93         179
## 810       0       1        0          0              0  51      200         144
## 811       1       1        0          1              0  70      119         150
## 812       0       1        0          1              1  50      123         136
## 813       0       0        1          0              0  75      188         137
## 814       1       0        0          0              1  60      147         161
## 815       0       1        1          1              0  52      145         136
## 816       1       0        1          0              0  75      184         194
## 817       0       0        0          1              1  69      178         164
## 818       0       1        1          1              1  44      146         170
## 819       1       0        1          0              0  76       96         177
## 820       1       0        0          1              0  51       91         198
## 821       0       1        1          0              1  44      140         177
## 822       0       1        1          1              1  58      156         194
## 823       0       1        0          1              1  78      130         164
## 824       1       1        1          1              0  59      122         159
## 825       1       1        1          1              0  43      136         169
## 826       0       1        0          1              0  69      153         180
## 827       0       1        0          1              1  48      102         167
## 828       0       0        0          1              1  63      182         159
## 829       0       0        1          0              0  47      198         145
## 830       0       0        1          1              0  50      131         179
## 831       0       0        0          0              0  58      100         182
## 832       0       0        0          1              1  54       90         183
## 833       1       0        1          1              0  42      154         150
## 834       0       1        0          0              0  57      112         146
## 835       0       0        0          1              0  40      171         140
## 836       0       0        1          0              1  49      123         158
## 837       1       1        1          0              1  40      118         189
## 838       1       0        1          1              0  40      117         151
## 839       0       0        0          1              0  48       91         146
## 840       0       1        1          1              0  54      112         151
## 841       0       1        1          0              0  51      107         184
## 842       1       0        0          1              1  51      164         180
## 843       0       1        1          0              0  49      198         198
## 844       1       1        1          1              1  63      191         165
## 845       1       0        0          1              0  73      101         170
## 846       0       1        1          1              1  60       94         195
## 847       1       0        0          0              1  75      182         179
## 848       1       0        0          1              1  55      117         160
## 849       0       0        1          1              1  77      137         194
## 850       1       1        0          0              1  79      115         182
## 851       1       1        1          1              0  67      200         170
## 852       1       0        0          1              1  59      123         183
## 853       0       0        0          1              0  63      127         190
## 854       1       1        0          0              1  62      110         195
## 855       0       0        0          0              0  52      150         184
## 856       1       0        1          0              0  56      161         180
## 857       0       1        0          0              0  48      186         163
## 858       1       1        1          0              0  55      129         147
## 859       0       0        0          1              0  52      137         154
## 860       0       0        1          0              1  68      163         178
## 861       0       1        0          0              0  57      159         169
## 862       1       1        0          0              0  54      198         166
## 863       1       0        1          0              0  55      170         194
## 864       0       1        1          1              1  58      111         200
## 865       1       0        0          0              0  71      144         199
## 866       1       0        0          0              0  62      107         170
## 867       0       1        0          1              1  75      120         174
## 868       0       1        1          1              0  50      197         163
## 869       0       0        1          1              1  63      183         152
## 870       0       0        1          0              0  56      170         172
## 871       1       1        0          1              1  64      133         176
## 872       1       1        1          1              0  60      119         147
## 873       0       0        1          1              0  61      116         181
## 874       1       1        1          0              1  45      154         139
## 875       1       0        0          1              0  42      114         172
## 876       0       1        0          0              0  71      188         171
## 877       0       1        1          1              0  65      195         149
## 878       0       0        1          1              1  40      186         144
## 879       0       1        1          0              1  46      129         153
## 880       0       0        0          1              0  50      180         131
## 881       1       1        1          1              1  55      141         158
## 882       0       0        0          0              1  52      192         170
## 883       0       1        0          0              1  57       99         182
## 884       1       0        1          0              1  61      170         185
## 885       0       1        0          0              0  76      157         136
## 886       1       0        0          1              1  59      189         141
## 887       0       0        0          0              1  48       96         134
## 888       0       0        1          0              0  41      108         183
## 889       1       1        1          0              1  56      125         161
## 890       1       1        0          1              0  61      171         144
## 891       1       1        1          0              1  47      138         197
## 892       1       1        1          0              1  51      161         194
## 893       1       1        0          0              1  48      152         186
## 894       0       0        0          0              1  52      128         146
## 895       1       1        1          0              1  40      144         185
## 896       1       0        1          1              0  52      101         137
## 897       0       0        1          0              0  79      103         133
## 898       0       1        1          1              0  61      119         188
## 899       1       1        0          0              0  53      181         149
## 900       0       0        1          1              0  42      114         144
## 901       0       0        1          1              1  54      180         137
## 902       1       0        0          1              1  65      110         134
## 903       0       1        0          1              1  61      160         199
## 904       1       1        1          0              1  47      176         156
## 905       1       1        0          0              0  57      134         141
## 906       0       1        0          0              0  46      149         161
## 907       1       0        0          1              1  78      153         134
## 908       1       1        1          0              0  52       90         174
## 909       0       0        1          1              1  50       94         192
## 910       1       0        0          1              0  79      196         170
## 911       0       1        1          0              1  57       96         133
## 912       1       1        1          0              1  40      117         157
## 913       1       0        1          1              1  78      121         184
## 914       1       1        0          1              0  45      153         199
## 915       1       1        0          0              1  52      142         151
## 916       1       0        1          0              1  42      125         185
## 917       0       1        1          0              1  54      141         163
## 918       1       0        1          1              0  75      122         170
## 919       1       0        0          0              0  62      175         141
## 920       1       0        1          0              1  58      159         173
## 921       0       1        0          1              1  44      165         148
## 922       0       0        1          1              1  68      125         167
## 923       0       0        1          1              0  45      111         140
## 924       0       0        1          0              0  46      115         163
## 925       0       0        0          0              1  51      103         168
## 926       1       1        1          0              1  67      185         193
## 927       1       0        1          0              1  41      115         155
## 928       1       1        0          0              0  77       97         179
## 929       1       0        0          0              1  45      112         194
## 930       1       0        1          1              0  47      109         199
## 931       0       1        1          1              0  57      107         158
## 932       0       0        1          1              1  54      192         181
## 933       0       0        0          1              0  66      117         192
## 934       1       0        0          1              1  65      116         151
## 935       0       1        0          1              0  61      107         180
## 936       1       1        1          0              0  65      129         135
## 937       0       1        1          0              0  74      158         197
## 938       0       0        1          1              1  71      187         184
## 939       0       0        1          0              0  77      141         133
## 940       0       1        0          1              1  77      107         155
## 941       1       1        1          1              1  67      102         159
## 942       0       0        1          1              0  52      128         144
## 943       0       0        0          1              0  58      156         141
## 944       1       0        0          1              1  56      136         143
## 945       0       0        0          0              1  72      196         150
## 946       1       0        0          1              0  43      189         144
## 947       0       0        0          0              1  56      118         185
## 948       1       0        0          0              1  48      185         131
## 949       1       1        1          1              1  79      104         169
## 950       0       1        0          1              1  57      138         139
## 951       0       1        0          0              1  51       96         162
## 952       0       0        0          1              1  78      189         189
## 953       1       1        0          1              1  67      134         164
## 954       0       1        0          1              1  69      133         164
## 955       1       1        1          0              0  55      184         179
## 956       1       0        1          1              0  42      167         153
## 957       1       1        1          1              1  65      137         133
## 958       1       1        1          0              0  59      126         169
## 959       0       1        1          0              1  40      171         158
## 960       0       0        1          1              1  46      156         192
## 961       0       1        1          0              0  46      177         138
## 962       1       0        0          0              0  52      116         144
## 963       1       0        1          1              0  71      188         191
## 964       1       1        1          1              0  46      103         133
## 965       1       0        1          1              0  40      152         166
## 966       1       0        1          0              1  54      162         166
## 967       1       1        1          1              1  52      190         162
## 968       1       1        1          1              1  64      145         197
## 969       0       0        1          1              0  43      169         177
## 970       1       1        0          0              0  73      173         200
## 971       1       1        1          0              0  66      162         146
## 972       1       1        1          0              1  42      181         155
## 973       1       1        1          1              1  67      178         156
## 974       1       1        0          0              1  65      135         136
## 975       0       0        1          0              0  58      105         135
## 976       0       0        0          1              1  48      196         177
## 977       0       1        1          1              1  68      162         137
## 978       0       1        0          1              1  45      184         146
## 979       1       1        1          0              1  77      160         142
## 980       1       1        0          1              0  78      182         179
## 981       0       0        1          1              1  79      134         194
## 982       0       0        0          1              0  59      181         174
## 983       1       0        0          0              1  58      129         165
## 984       0       1        0          1              0  76      114         186
## 985       0       0        1          1              1  70      125         189
## 986       1       0        0          1              0  62      177         145
## 987       1       1        0          0              1  56      127         135
## 988       1       0        0          1              0  41      116         140
## 989       1       1        1          1              0  75      172         160
## 990       1       0        0          1              1  53      104         151
## 991       0       1        1          0              0  48      174         193
## 992       1       0        1          1              1  48      120         150
## 993       1       1        0          1              1  65      100         191
## 994       1       0        0          0              1  51      125         140
## 995       1       0        0          0              0  47      179         164
## 996       1       0        1          1              1  42      110         175
## 997       0       1        0          1              0  75      123         130
## 998       1       1        1          0              1  76      178         179
## 999       0       1        1          1              0  65      139         169
## 1000      0       1        0          0              0  57      182         146
##      HDL Risk
## 1     32 11.1
## 2     59 30.1
## 3     59 37.6
## 4     46 13.2
## 5     63 15.1
## 6     22 17.3
## 7     52  2.1
## 8     59 46.0
## 9     99  1.7
## 10    46 48.5
## 11    48 50.3
## 12    44 22.9
## 13    48  1.0
## 14    71 19.7
## 15    95  5.1
## 16    72 17.0
## 17    40 24.2
## 18    51 32.0
## 19    92 53.0
## 20    48 37.3
## 21    46 10.3
## 22    57  7.9
## 23    62  4.9
## 24    23 44.6
## 25    68 23.9
## 26    38 28.8
## 27    24 18.5
## 28    90 13.5
## 29    23 32.6
## 30    96 18.8
## 31    49  9.0
## 32    93  0.4
## 33    88  3.7
## 34    50 24.0
## 35    81 25.1
## 36    93 23.3
## 37    91 56.4
## 38    70  8.1
## 39    86 15.6
## 40    50  9.7
## 41    41 60.5
## 42    90  8.5
## 43    97  8.0
## 44    45  1.4
## 45    58  0.1
## 46    72 23.7
## 47    30 23.7
## 48    42  3.2
## 49    39  0.8
## 50    65 68.5
## 51    32 33.9
## 52    87 35.0
## 53    29 19.6
## 54    21 17.5
## 55    39 51.2
## 56    76  1.6
## 57    52 49.4
## 58    49  1.3
## 59    35 13.4
## 60    93 13.1
## 61    89  1.6
## 62    65  5.2
## 63    61  4.1
## 64    20 60.8
## 65    54 16.9
## 66    63 32.1
## 67    45 10.4
## 68    46  9.4
## 69    98 22.5
## 70    95  7.4
## 71    48  8.2
## 72    82 19.9
## 73    90 55.2
## 74    23 29.1
## 75    39  3.4
## 76    29 24.1
## 77    23  7.0
## 78   100  2.3
## 79    39 32.5
## 80    93 56.1
## 81    33 27.3
## 82    50 44.8
## 83    62  4.9
## 84    79 22.2
## 85    57 10.5
## 86    21 32.0
## 87    44 10.8
## 88    23  6.6
## 89    86  2.5
## 90    34 11.4
## 91    68  0.9
## 92    79 18.3
## 93    27 17.7
## 94    31 29.0
## 95    90 29.5
## 96    94  0.7
## 97    54 11.7
## 98    30 55.3
## 99    63  3.3
## 100   64 52.5
## 101   91  4.2
## 102   28 26.9
## 103   38  1.7
## 104   97 18.0
## 105   28  8.3
## 106   78 43.1
## 107   85  3.1
## 108   51  1.2
## 109   29 15.6
## 110   92  3.1
## 111   33  3.0
## 112   99 37.0
## 113   94  5.2
## 114   37 36.7
## 115   62  0.8
## 116   70 11.6
## 117   92  0.3
## 118   40  6.7
## 119   53 78.1
## 120   49 54.5
## 121   69 10.4
## 122   57 12.0
## 123   30 28.6
## 124   46  6.1
## 125   61 11.9
## 126   30 12.8
## 127   77  4.8
## 128   44  4.3
## 129   28 15.6
## 130   25 25.3
## 131   93  1.9
## 132   71  2.5
## 133   85 27.7
## 134   65 46.5
## 135   25 20.3
## 136   27 35.5
## 137   81 14.1
## 138   46  4.9
## 139   97  3.2
## 140   36 13.7
## 141   39 24.0
## 142   64  4.3
## 143   96 37.1
## 144   78 32.2
## 145   69 21.2
## 146   41 10.3
## 147   64 17.9
## 148   61 43.4
## 149   96  7.1
## 150   81  1.5
## 151   50 32.5
## 152   55 49.7
## 153   55  5.2
## 154   38  7.1
## 155   69  5.5
## 156   23 43.0
## 157   77 23.7
## 158   50  5.9
## 159   65 33.8
## 160   31  5.8
## 161   97 23.4
## 162   38 32.3
## 163   77 30.1
## 164   42 15.9
## 165   33 17.0
## 166   69  5.0
## 167   86  0.6
## 168   23  7.9
## 169   63  3.8
## 170   63 48.3
## 171   29 46.4
## 172   22 11.4
## 173   87  9.2
## 174   99 24.5
## 175   39 23.1
## 176   80  1.3
## 177   91  5.3
## 178   50 35.0
## 179   34  5.1
## 180   63  3.3
## 181   22 10.9
## 182   51  0.4
## 183   20 35.2
## 184   35  5.4
## 185   96  5.4
## 186   64 26.2
## 187   85 10.1
## 188   88 37.2
## 189   29 10.7
## 190   72 23.4
## 191   74 36.2
## 192   65 14.6
## 193   93  7.4
## 194   74 42.9
## 195   83  4.6
## 196   34 24.3
## 197   95  6.7
## 198   93 16.4
## 199   79 54.4
## 200   59  0.3
## 201   45 65.0
## 202   66 20.4
## 203   60 10.8
## 204   96  9.8
## 205   76 51.1
## 206   42 29.6
## 207   56 12.1
## 208   44 42.0
## 209   72 38.5
## 210   75  9.6
## 211   25 33.0
## 212   58 27.3
## 213   79 62.4
## 214   71  4.8
## 215   63 20.8
## 216   21  4.0
## 217   60 29.2
## 218   25 58.2
## 219   20 18.2
## 220   64 24.4
## 221   56 43.3
## 222   96 18.1
## 223   97  0.3
## 224   59 17.1
## 225   44 70.6
## 226   64 15.8
## 227   83 13.0
## 228   45 41.1
## 229   61 28.1
## 230   73 23.7
## 231   35 31.1
## 232   59 18.3
## 233   72 13.3
## 234   83 48.9
## 235   26  9.9
## 236   57  5.3
## 237   66 45.9
## 238   52 17.1
## 239   47 10.6
## 240   62 50.5
## 241   54  4.2
## 242   92 16.1
## 243   33 15.5
## 244   85  0.7
## 245   50  6.6
## 246   83  4.9
## 247   39  4.2
## 248   69 33.8
## 249  100  5.4
## 250   28  1.8
## 251   96  9.1
## 252   67  8.6
## 253   41  4.9
## 254   48 11.2
## 255   94 14.1
## 256   29 18.7
## 257   70  2.6
## 258   56 27.6
## 259   44  7.9
## 260   99  8.6
## 261   80  0.4
## 262   94 16.2
## 263   80  3.2
## 264   84  1.5
## 265   26 13.1
## 266   54  0.6
## 267   73  8.7
## 268   37 15.2
## 269   69 49.6
## 270   80  4.8
## 271   81 47.3
## 272   46 38.0
## 273   40  7.8
## 274   72  6.0
## 275   61 10.1
## 276   69  5.3
## 277  100 33.1
## 278   90 28.4
## 279   84 23.2
## 280   72  3.8
## 281   41 28.0
## 282   46 17.4
## 283   25 23.6
## 284   41 40.9
## 285   29  3.9
## 286   78  7.1
## 287   76  0.9
## 288   64 11.4
## 289   84 11.9
## 290   21 32.0
## 291   94 19.8
## 292   98  5.9
## 293   38 52.1
## 294   21 71.2
## 295   58 20.3
## 296   37  7.8
## 297   95 22.1
## 298   33 17.2
## 299   81 11.0
## 300   40  9.9
## 301   95  5.0
## 302   28 34.1
## 303   85 15.7
## 304   20  9.8
## 305   53  8.7
## 306   32 47.4
## 307   48 25.2
## 308   27  6.3
## 309   63 55.7
## 310   78 11.8
## 311   97  0.6
## 312   88 56.5
## 313   99 27.0
## 314   88 67.5
## 315   48 21.1
## 316   97 17.0
## 317   46 12.8
## 318   99  9.4
## 319   93  7.4
## 320   92 11.0
## 321   48 24.4
## 322   72 62.7
## 323   30 55.6
## 324  100 15.0
## 325   58 31.0
## 326   40 14.3
## 327   95  1.5
## 328   31 45.1
## 329   54 17.1
## 330   31 42.4
## 331   99 10.6
## 332   24 45.4
## 333   76 30.9
## 334   45  2.2
## 335   46 16.4
## 336   86  4.1
## 337   81  1.3
## 338   58  5.2
## 339   84 11.6
## 340   43  4.6
## 341   46  2.3
## 342   96  7.7
## 343   64  8.3
## 344   65  1.1
## 345   65 24.2
## 346   36 23.0
## 347   69  2.3
## 348   84  2.4
## 349   90 12.1
## 350   90  3.1
## 351   91  7.5
## 352   67 36.2
## 353   38  4.5
## 354   44 35.3
## 355   54  4.2
## 356   61  2.3
## 357   80  7.9
## 358   83  0.6
## 359   90  0.8
## 360   63 40.7
## 361   86 18.8
## 362   35 20.8
## 363   38 14.0
## 364   45 20.8
## 365   28 21.7
## 366   50  8.6
## 367   58  1.1
## 368   38  9.5
## 369   76 17.9
## 370   72 68.0
## 371   53  8.1
## 372   22  8.5
## 373   27 36.5
## 374   73  1.6
## 375   50  8.6
## 376   41 27.9
## 377   76 13.3
## 378   33 15.7
## 379   35 27.3
## 380   92 16.4
## 381   93  9.6
## 382   60 25.1
## 383   75  7.0
## 384   83 48.1
## 385   93 32.6
## 386   98 20.6
## 387   84 23.7
## 388   47 26.5
## 389  100 35.5
## 390   61 22.2
## 391   31 24.0
## 392   51 30.3
## 393   50  3.6
## 394   92  1.3
## 395   33 52.0
## 396   36 17.8
## 397   75  4.4
## 398   33 23.9
## 399   69  7.3
## 400   42 41.2
## 401   49 10.7
## 402   34 21.6
## 403   80 12.7
## 404   64 14.4
## 405   38  8.7
## 406   88  1.8
## 407   60 39.8
## 408   30 11.8
## 409   73 56.1
## 410   41 27.8
## 411   26 30.5
## 412   41  4.6
## 413   47  2.4
## 414   91  0.3
## 415   23 25.9
## 416   26 40.6
## 417   48  5.6
## 418   53 25.4
## 419   99 36.1
## 420   85  5.9
## 421   69  4.4
## 422   70 12.7
## 423   91  3.1
## 424   70  4.2
## 425   38  5.8
## 426   65 35.9
## 427   41 26.4
## 428   77 15.5
## 429   44 13.2
## 430   46  6.0
## 431   53 29.0
## 432   66  5.6
## 433   84 38.1
## 434   27 14.6
## 435   85 20.9
## 436   69 12.7
## 437   43 23.3
## 438   26 34.5
## 439   60  8.4
## 440   33 11.7
## 441   33 27.7
## 442   44  5.2
## 443   20 38.8
## 444   28 13.5
## 445   88 50.7
## 446   59 50.6
## 447   36  1.2
## 448   85 56.8
## 449   79 12.7
## 450   33 18.2
## 451   49  5.2
## 452   89  2.2
## 453   97 13.8
## 454   73 20.9
## 455   63 14.1
## 456   25 46.8
## 457   45  2.5
## 458   59  1.7
## 459   31 21.6
## 460   95 16.7
## 461   81 30.1
## 462   76 13.3
## 463   47 16.8
## 464   47  4.8
## 465   89  7.9
## 466   29  9.8
## 467   80 34.3
## 468   80 21.8
## 469   94 30.0
## 470   55 13.8
## 471  100 21.0
## 472   62 11.1
## 473   39 22.0
## 474   49 23.7
## 475   26 72.6
## 476   23  5.7
## 477   40 11.0
## 478   87  2.6
## 479   49 34.5
## 480   38 44.2
## 481   70  0.1
## 482   82  7.8
## 483   45 31.1
## 484   40  9.4
## 485   25 22.7
## 486   68 69.5
## 487   91 46.0
## 488   93  3.2
## 489   78 61.3
## 490   47 15.0
## 491   54 25.1
## 492   41  1.2
## 493   86  1.5
## 494   47 12.1
## 495   82 10.5
## 496   51 55.7
## 497   90  7.7
## 498   97  3.2
## 499  100 16.4
## 500   90  5.4
## 501   93 37.5
## 502   79  1.7
## 503   55  1.5
## 504   79  0.1
## 505   91  0.1
## 506   27  3.8
## 507   65  5.6
## 508   79  0.2
## 509   58 19.7
## 510   93  4.1
## 511   38 69.3
## 512   37  6.2
## 513   64 13.5
## 514   46 41.9
## 515   84 19.4
## 516   95  1.0
## 517   48 42.5
## 518   31 26.5
## 519   76 37.9
## 520   65 63.1
## 521   56 21.4
## 522   53 43.1
## 523   88  0.9
## 524   99  8.4
## 525   38  2.6
## 526  100 51.8
## 527   69 22.2
## 528   45  8.7
## 529   91  1.0
## 530   75 17.6
## 531   60 65.9
## 532   24  3.6
## 533   37 31.3
## 534   28  8.4
## 535   29 20.7
## 536   53 15.4
## 537   32 11.5
## 538   88 71.0
## 539   73  7.4
## 540  100 11.7
## 541   71  5.0
## 542   75 50.5
## 543   23 44.6
## 544   45 30.2
## 545   49 11.5
## 546   85 13.1
## 547   73  8.1
## 548   69 20.6
## 549   77 32.8
## 550   56  3.5
## 551   74  0.8
## 552   84 32.2
## 553   84 39.3
## 554   58 28.7
## 555   59 54.5
## 556   80 44.1
## 557   81 85.4
## 558   41  6.1
## 559   75 41.0
## 560   92 28.1
## 561   24 50.7
## 562   33 12.9
## 563   93  1.8
## 564   61 28.8
## 565   32 15.0
## 566   57 38.8
## 567   24 46.9
## 568   98 30.2
## 569   58 35.5
## 570   63 31.6
## 571   64 56.1
## 572   63 14.0
## 573   39  6.2
## 574   21 10.4
## 575   40 43.1
## 576   85  4.2
## 577   78  6.5
## 578   98 69.4
## 579   77  0.5
## 580   76 13.4
## 581   37 19.1
## 582   87 17.6
## 583   71 32.8
## 584   59  2.3
## 585   52 13.9
## 586   86  3.0
## 587   98 10.6
## 588   86  5.5
## 589   24 27.6
## 590   20 40.8
## 591   28 51.9
## 592   24 29.4
## 593   86  0.4
## 594   31 17.1
## 595   46  1.4
## 596   22 40.5
## 597   95 35.4
## 598   57  5.3
## 599   26 14.2
## 600   60  2.1
## 601   50  1.8
## 602   88 20.2
## 603   66 46.9
## 604   28 21.5
## 605  100 16.4
## 606   71  4.8
## 607   90 22.7
## 608   47 12.8
## 609   44 47.1
## 610   72  9.8
## 611   49  6.4
## 612   48 19.0
## 613   26 20.1
## 614   50 24.3
## 615   78 22.0
## 616   90 21.5
## 617   67 28.0
## 618   39 12.9
## 619   99 10.0
## 620   34 30.7
## 621   40 10.0
## 622   38 22.2
## 623   44 27.5
## 624   97  4.0
## 625   24 15.1
## 626   61  7.8
## 627   42 37.9
## 628   94  6.5
## 629   40 32.0
## 630   97 17.6
## 631   98 50.2
## 632   67  0.7
## 633   59  9.1
## 634   26 10.5
## 635   97  0.6
## 636   60 13.4
## 637   58 37.4
## 638   45  6.8
## 639   96  8.1
## 640   28 23.7
## 641   27 23.8
## 642   87  2.0
## 643   35 13.4
## 644   68  0.7
## 645   79 10.9
## 646   20  8.2
## 647   62  1.7
## 648   57  0.8
## 649   58  8.2
## 650   76  7.8
## 651   55  1.2
## 652   99 12.2
## 653   83 55.3
## 654   36 24.5
## 655   77  0.7
## 656   88 51.9
## 657   52 20.9
## 658   91 25.2
## 659   91  7.5
## 660   38 22.0
## 661   53 11.6
## 662   71 10.9
## 663   24 17.3
## 664   22 28.5
## 665   20 37.3
## 666   71 13.4
## 667   32  5.1
## 668   77 28.1
## 669   63 13.2
## 670   52 29.9
## 671   40 45.0
## 672   40 13.3
## 673   65 40.2
## 674   95 38.1
## 675   36 47.6
## 676   69 12.2
## 677   25 21.2
## 678   74  0.9
## 679   97 15.2
## 680   94  3.8
## 681   60 14.4
## 682   47  6.3
## 683   92  7.5
## 684   60 40.6
## 685   40  7.1
## 686   69  1.5
## 687   97 15.2
## 688   83 34.0
## 689   99 14.0
## 690   64 20.2
## 691   25 21.1
## 692   58 23.6
## 693   53  8.0
## 694   41 13.8
## 695   87 28.8
## 696   65  4.5
## 697   29 11.3
## 698   25 34.6
## 699   83 30.9
## 700   55 26.1
## 701   90  2.2
## 702   66 10.4
## 703   82 32.5
## 704   76  3.7
## 705   74 73.4
## 706   66  1.5
## 707   75 23.7
## 708   62  2.5
## 709   28 25.3
## 710   39  2.8
## 711   30 13.3
## 712   61  8.6
## 713   70  2.9
## 714   84  6.3
## 715   44  2.4
## 716   57 19.1
## 717   31 42.2
## 718   72  0.4
## 719   25  9.8
## 720   99 14.9
## 721   36  3.7
## 722   93 21.4
## 723   86  2.2
## 724   71  6.4
## 725   37 62.7
## 726   58 70.1
## 727   86 28.0
## 728   79  7.7
## 729   68 10.8
## 730   47 33.9
## 731   28  9.2
## 732   60  3.3
## 733   85 16.5
## 734   56 10.2
## 735   86 16.3
## 736   26  7.7
## 737   69  4.4
## 738   38 11.1
## 739   24 14.6
## 740   84 15.4
## 741   25 58.9
## 742   26  5.7
## 743   38 17.1
## 744   38  1.3
## 745   69 16.0
## 746   20  2.9
## 747   60  9.0
## 748   74 15.3
## 749   23 29.0
## 750   49 16.6
## 751  100 16.5
## 752   53 38.7
## 753   45 23.5
## 754   37 10.5
## 755  100  2.2
## 756   28 41.0
## 757   94 37.3
## 758   95 32.1
## 759   99 38.4
## 760   29 25.0
## 761   36 52.7
## 762   67  3.4
## 763   47 22.0
## 764   48  2.8
## 765   35 20.5
## 766   83 49.3
## 767   71 15.9
## 768   29 17.6
## 769   82 15.9
## 770   61 30.3
## 771   57  6.8
## 772   45 39.3
## 773   97  6.7
## 774   29 25.6
## 775   86  4.3
## 776   45 12.3
## 777   22 45.0
## 778   60  2.0
## 779   44 21.3
## 780   82  0.7
## 781   99  8.4
## 782   53 11.0
## 783   34 16.3
## 784   69 10.0
## 785   92 20.6
## 786   77  6.9
## 787   54 18.3
## 788   69  1.5
## 789   66 32.9
## 790   47 26.7
## 791   74 20.7
## 792   91 16.3
## 793   40 29.4
## 794   77 10.5
## 795   23  4.9
## 796   86  0.2
## 797   65  5.1
## 798   50 48.5
## 799   23 23.5
## 800   56 18.3
## 801   21 60.4
## 802   33 10.0
## 803   23 31.2
## 804   28  1.9
## 805   29  3.7
## 806   24 33.2
## 807   96 37.6
## 808   21 16.0
## 809   22  6.2
## 810   37 14.0
## 811   94 14.6
## 812   26 13.3
## 813   97 41.3
## 814   80  7.5
## 815   66  9.7
## 816   56 42.8
## 817   55 34.5
## 818   75  8.7
## 819   75 15.1
## 820   41  4.9
## 821   73  3.3
## 822   48 44.8
## 823   82 38.7
## 824   58 19.8
## 825   40 13.1
## 826   56 29.9
## 827   25  7.5
## 828   84 15.6
## 829   55  4.3
## 830   99  3.0
## 831   70  1.3
## 832   34  3.4
## 833   38  9.5
## 834   96  1.2
## 835   44  1.5
## 836   43  4.1
## 837   30  8.6
## 838   39  4.9
## 839   69  0.5
## 840   60  6.0
## 841   55  2.0
## 842   82  6.9
## 843   86 10.0
## 844   83 60.5
## 845   34 29.0
## 846   46 13.6
## 847   22 55.3
## 848   29 13.0
## 849   70 61.4
## 850   39 21.5
## 851   28 61.1
## 852   23 25.3
## 853   64  7.6
## 854   86  9.3
## 855   86  1.1
## 856   82  9.6
## 857   85  3.1
## 858   21 13.4
## 859   84  1.5
## 860   49 25.9
## 861   54  6.7
## 862   94 10.7
## 863   20 32.8
## 864   66 15.1
## 865   28 28.9
## 866   36  8.1
## 867   97 32.6
## 868   35 55.5
## 869   27 45.1
## 870   20 18.0
## 871   60 27.0
## 872   97 16.8
## 873   66  9.7
## 874   31 16.5
## 875   74  1.0
## 876   40 20.1
## 877   90 51.6
## 878   38 13.2
## 879   82  2.0
## 880   61  2.3
## 881   86 30.4
## 882   24  9.2
## 883   25  4.3
## 884   48 25.2
## 885   32 13.2
## 886   36 29.1
## 887   61  0.4
## 888   72  0.9
## 889   97 14.2
## 890   32 25.7
## 891   79 12.7
## 892   20 28.8
## 893   43 11.1
## 894   43  1.8
## 895   20 13.8
## 896   54  6.3
## 897   30 20.1
## 898   87 15.4
## 899   98  8.4
## 900   47  2.7
## 901   35 22.1
## 902   36 19.2
## 903   81 24.0
## 904   20 26.5
## 905   22  9.0
## 906   44  2.9
## 907   63 60.2
## 908   63  4.8
## 909   22 14.1
## 910   51 76.8
## 911   71  2.4
## 912   48  7.0
## 913   37 53.8
## 914   89  8.8
## 915   71  9.7
## 916   40  5.7
## 917   38 15.0
## 918   46 41.4
## 919   57 11.8
## 920   96 10.9
## 921   66  9.0
## 922   67 26.1
## 923   21  9.9
## 924   50  2.0
## 925   80  0.6
## 926   53 47.4
## 927   23  7.0
## 928  100  7.2
## 929   34  2.9
## 930   35 13.2
## 931   76  6.3
## 932   71 17.3
## 933   69  8.9
## 934   92 13.5
## 935   42  8.4
## 936   82 13.0
## 937   84 41.4
## 938   83 60.5
## 939   50 30.8
## 940   74 27.1
## 941   69 28.9
## 942   28 10.7
## 943   58  5.3
## 944   79  7.1
## 945   27 33.3
## 946   57  2.8
## 947   65  2.0
## 948   80  2.0
## 949   53 44.4
## 950   59 11.2
## 951   25  2.7
## 952   98 78.5
## 953   27 36.0
## 954   95 23.0
## 955   81 17.5
## 956   24 20.3
## 957   98 37.7
## 958   82 10.7
## 959   75  5.5
## 960   20 39.3
## 961   70  4.7
## 962   91  1.3
## 963   20 76.5
## 964   70  7.6
## 965   20 22.7
## 966   59 11.8
## 967   85 43.2
## 968   25 58.1
## 969   53  7.2
## 970   56 20.6
## 971   73 20.7
## 972   58 16.2
## 973   55 65.2
## 974   21 20.2
## 975   48  3.2
## 976   31 11.3
## 977   82 46.3
## 978   98  7.7
## 979   99 40.3
## 980   34 47.3
## 981   67 67.0
## 982   54  9.9
## 983   41  8.3
## 984   68 33.2
## 985   94 30.4
## 986   82 17.9
## 987   22 12.7
## 988   58  0.8
## 989   64 51.5
## 990   69  3.7
## 991   34 21.0
## 992   29 15.4
## 993   84 16.0
## 994   69  1.9
## 995   58  2.8
## 996   67  3.9
## 997   56 23.9
## 998   30 61.3
## 999   66 32.3
## 1000  38 11.7

The structure of the dataset provides a top-level view of the variables it contains, by understanding the structure of the dataset and the attributes it contains, we can better analyze and interpret the data to gain insights into the relationship between these variables and the 10-year ASCVD risk:

str(dataset)
## 'data.frame':    1000 obs. of  10 variables:
##  $ isMale        : int  1 0 0 1 0 0 1 1 0 1 ...
##  $ isBlack       : int  1 0 1 1 0 0 0 0 0 0 ...
##  $ isSmoker      : int  0 0 1 1 1 1 1 1 1 0 ...
##  $ isDiabetic    : int  1 1 1 1 0 0 0 1 0 1 ...
##  $ isHypertensive: int  1 1 1 0 1 1 0 0 1 1 ...
##  $ Age           : int  49 69 50 42 66 52 40 75 42 65 ...
##  $ Systolic      : int  101 167 181 145 134 154 104 136 169 196 ...
##  $ Cholesterol   : int  181 155 147 166 199 174 187 189 179 187 ...
##  $ HDL           : int  32 59 59 46 63 22 52 59 99 46 ...
##  $ Risk          : num  11.1 30.1 37.6 13.2 15.1 17.3 2.1 46 1.7 48.5 ...

Dataset dimensions:

dim(dataset)
## [1] 1000   10
  • Number of rows = 1000 , Number of columns = 10

To summarize the descriptive statistics for all the columns in the dataset, we can calculate various statistical measures for each attribute:

library(Hmisc)
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
describe(dataset)
## dataset 
## 
##  10  Variables      1000  Observations
## --------------------------------------------------------------------------------
## isMale 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1000        0        2     0.75      490     0.49   0.5003 
## 
## --------------------------------------------------------------------------------
## isBlack 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1000        0        2    0.747      530     0.53   0.4987 
## 
## --------------------------------------------------------------------------------
## isSmoker 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1000        0        2    0.749      516    0.516      0.5 
## 
## --------------------------------------------------------------------------------
## isDiabetic 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1000        0        2    0.749      522    0.522   0.4995 
## 
## --------------------------------------------------------------------------------
## isHypertensive 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##     1000        0        2     0.75      495    0.495   0.5005 
## 
## --------------------------------------------------------------------------------
## Age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0       40    0.999    59.11    13.32       42       43 
##      .25      .50      .75      .90      .95 
##       49       59       69       75       77 
## 
## lowest : 40 41 42 43 44, highest: 75 76 77 78 79
## --------------------------------------------------------------------------------
## Systolic 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0      111        1    144.2    36.69       95      102 
##      .25      .50      .75      .90      .95 
##      117      144      171      189      194 
## 
## lowest :  90  91  92  93  94, highest: 196 197 198 199 200
## --------------------------------------------------------------------------------
## Cholesterol 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0       71        1      164    23.48      133      136 
##      .25      .50      .75      .90      .95 
##      146      164      182      192      196 
## 
## lowest : 130 131 132 133 134, highest: 196 197 198 199 200
## --------------------------------------------------------------------------------
## HDL 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0       81        1     59.6    27.56       23       27 
##      .25      .50      .75      .90      .95 
##       39       59       81       93       97 
## 
## lowest :  20  21  22  23  24, highest:  96  97  98  99 100
## --------------------------------------------------------------------------------
## Risk 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0      439        1    19.67    18.37     1.20     2.20 
##      .25      .50      .75      .90      .95 
##     6.30    14.40    29.00    45.13    55.30 
## 
## lowest : 0.1  0.2  0.3  0.4  0.5 , highest: 76.5 76.8 78.1 78.5 85.4
## --------------------------------------------------------------------------------

To have a better understanding of the values in our Dataset, we applied various statistical measures to the attributes. These measures provide insights into different aspects of the data:

summary(dataset)
##      isMale        isBlack        isSmoker       isDiabetic    isHypertensive 
##  Min.   :0.00   Min.   :0.00   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:0.00   1st Qu.:0.00   1st Qu.:0.000   1st Qu.:0.000   1st Qu.:0.000  
##  Median :0.00   Median :1.00   Median :1.000   Median :1.000   Median :0.000  
##  Mean   :0.49   Mean   :0.53   Mean   :0.516   Mean   :0.522   Mean   :0.495  
##  3rd Qu.:1.00   3rd Qu.:1.00   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :1.00   Max.   :1.00   Max.   :1.000   Max.   :1.000   Max.   :1.000  
##       Age           Systolic      Cholesterol       HDL             Risk      
##  Min.   :40.00   Min.   : 90.0   Min.   :130   Min.   : 20.0   Min.   : 0.10  
##  1st Qu.:49.00   1st Qu.:117.0   1st Qu.:146   1st Qu.: 39.0   1st Qu.: 6.30  
##  Median :59.00   Median :144.0   Median :164   Median : 59.0   Median :14.40  
##  Mean   :59.11   Mean   :144.2   Mean   :164   Mean   : 59.6   Mean   :19.67  
##  3rd Qu.:69.00   3rd Qu.:171.0   3rd Qu.:182   3rd Qu.: 81.0   3rd Qu.:29.00  
##  Max.   :79.00   Max.   :200.0   Max.   :200   Max.   :100.0   Max.   :85.40

We measured the Variance for all numeric attributes to see the degree of spread in the dataset:

var(dataset$Age)
## [1] 133.0906
var(dataset$Systolic)
## [1] 1009.621
var(dataset$Cholesterol)
## [1] 413.3045
var(dataset$HDL)
## [1] 569.4669
var(dataset$Risk)
## [1] 290.4959

All the attributes’ variance results are higher than their mean values, which implies that the dataset has greater variability and is more heterogeneous. This might indicate that the values in our dataset are more scattered; have a wider range of values, potentially suggesting a more diverse or varied pattern in the data.

Scatter Plot:

library(ggplot2)

ggplot(dataset, aes(x = Age, y =Systolic, color= 'red'))+
  geom_point() +
  xlab("Age") +
  ylab("Blood Pressure")

In order to gain a deeper understanding of our dataset, we examined the attributes “Systolic” and “Age” to determine if there was a predictive or correlational relationship between them. However, after analyzing the scatter plot, we discovered that there is no discernible relationship or correlation between these two attributes.

ggplot(dataset, aes(x = Systolic, y = Risk)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, aes(color = "Regression Line")) +
  facet_wrap(~cut(Age, 3), scales = "free") +
  xlab("Systolic Blood Pressure") +
  ylab("Risk") +
  ggtitle("Relationship between Systolic Blood Pressure and Risk at Different Age Levels") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

However, notable association between ‘Systolic Blood Pressure’, ‘Age’, and ‘Risk’, segmented into various age categories. It shows that risk notably rises with age and Blood Pressure,as the regression line for the age bracket (66,79] exhibits higher risks, indicating a high correlation between advancing age and elevated risk levels in this dataset.

Density Plot:

library(tidyr)

dataset_long <- gather(dataset, key = "column", value = "value", Age:ncol(dataset))

ggplot(dataset_long, aes(x = value, fill = column)) +
  geom_density(alpha = 0.7) +
  facet_wrap(~column, scales = "free") +
  xlab("Value") +
  ylab("Density")

To understand the relative frequency of different values within our dataest we measeured the density, and analyzed the corresponding graphs. Here are the observations we made:

- The graph representing the distribution of ages shows a reasonable representation of ages between 40 and 80 within the dataset. This suggests that the age values are well-distributed within this range.

- Both the density graphs for cholesterol and HDL indicate a slight skew towards lower cholesterol levels. This suggests that the majority of the data points tend to have lower cholesterol values rather than higher ones.

- The density graph for systolic blood pressure displays a uniform distribution across the entire range of blood pressures. This indicates that the data points are evenly spread out without any significant concentration in specific pressure ranges.

- The density graph for the risk variable exhibits a positively skewed (right-skewed) distribution. This implies that there is a higher frequency of data points with lower risk values, while the occurrence of higher risk values is relatively less frequent.

Bar Plot visualization for the ‘isSmoker’ attribute:

bb <- dataset$isSmoker %>% table() %>%
barplot(bb , col = c("lightgreen","darkred"), width= c(4,4.1),space=0.1, names.arg=c("o","1"), legend.text = c("Non-Smoker","Smoker"))

To better understand the smoking status within our dataset, we visualized the data using a bar plot. This visualization was chosen to provide a clear and easily interpretable representation of the differences in smoking status. From the bar plot, we observed that the numbers are nearly evenly distributed between non-smokers (0) and smokers (1). This indicates that there is a balanced representation of individuals who are non-smokers and smokers in the dataset.

Matrix measurement of the correlation in our dataset:

library(corrplot)
## corrplot 0.92 loaded
corr_matrix <- cor(dataset)
corrplot(corr_matrix, method = "color", type = "lower", tl.col = "black", tl.srt = 45, 
          addCoef.col = "black", number.cex = 0.7, tl.cex = 0.7, col = colorRampPalette(c("white", "lightblue"))(90))
## Warning in ind1:ind2: numerical expression has 2 elements: only the first used

By analyzing the correlation matrix of our dataset, we can identify suspicious events and patterns in the data. However, it is evident that there are no strong correlations among the features in the dataset. Despite this, we can rank the correlations in descending order based on their impact on the risk of heart disease.The order of correlations, from highest to lowest in terms of their influence on heart disease risk, is as follows: Age, Systolic blood pressure, is Diabetic, is Smoker, is Hypertensive, gender is male, , race is black, Cholestrol, HDL.

Box Plot:

boxplot(dataset$Age)

The Age boxplot shows a wide range of values that might lead to a lower accuracy of the results when it comes to clculations so we need change it to a standardized range. Additionally, the boxplot analysis indicates that there are no outliers present in the Age attribute. This implies that the Age data points are within a reasonable range and do not deviate significantly from the overall distribution of values.

boxplot(dataset$Systolic)

The boxplot analysis of the Systolic blood pressure attribute reveals the absence of outliers, indicating that the data points are within a reasonable range without any extreme values. However, it is worth noting that the range of Systolic blood pressure is considerably large. To ensure accurate calculations and mitigate potential conflicts, it is recommended to transform the Systolic blood pressure into a smaller and standardized range. This transformation will help normalize the data and make it more suitable for analysis and calculations.

boxplot(dataset$Cholesterol)

According to the boxplot analysis of the Cholesterol attribute, no outliers are observed, suggesting that the data points are within a reasonable range without any extreme values. However, it is important to narrow down the range of values to optimize the accuracy of our calculations. By reducing the range of Cholesterol values, we can improve the reliability and precision of our dataset, enabling us to obtain more reliable and meaningful results.

boxplot(dataset$HDL)

The HDL boxplot reveal that there are no outlires shown. However, it is necessary to transform the range of HDL values to bring them into a standardized and common range. By performing this transformation, we can almost ensure to have better insights and improved data quality.


2-Data cleaning

2.1 Missing values:

Since missing/null values can affect the dataset badly we decided to check it and delete all missing/null values from our dataset to make it as clean as possible so that we can end up with efficint dataset resulting to a higher possibiliaty of accurete results later on.

# Check for missing values
missing_values <- colSums(is.na(dataset))

# Print columns with missing values
print("Columns with missing values:")
## [1] "Columns with missing values:"
print(names(missing_values)[missing_values > 0])
## character(0)
# Print the count of missing values for each column
print("Count of missing values for each column:")
## [1] "Count of missing values for each column:"
print(missing_values)
##         isMale        isBlack       isSmoker     isDiabetic isHypertensive 
##              0              0              0              0              0 
##            Age       Systolic    Cholesterol            HDL           Risk 
##              0              0              0              0              0
The analysis revealed that there are no missing values across any of the attributes.

2.2 Detecting and removing outliers:

In data analysis, checking and removing outliers is crucial to ensure the reliability of statistical insights. Outliers, as extreme data points, can distort summary statistics, potentially leading to inaccurate analyses. By identifying and, if necessary, removing outliers, we enhance the robustness of our findings.

# Compute IRQ
Q1 <- quantile(dataset$Age, 0.25)
Q3 <- quantile(dataset$Age, 0.75)
IQR <- Q3 - Q1

# Identify outliers
lower_bound <- Q1 - (1.5 * IQR)
upper_bound <- Q3 + (1.5 * IQR)
outliers <- which(dataset$Age < lower_bound | dataset$Age > upper_bound)

# Get the number of outliers
num_outliers <- length(outliers)
print(paste("Number of Age outliers:", num_outliers))
## [1] "Number of Age outliers: 0"
# Compute IRQ
Q1 <- quantile(dataset$Systolic, 0.25)
Q3 <- quantile(dataset$Systolic, 0.75)
IQR <- Q3 - Q1

# Identify outliers
lower_bound <- Q1 - (1.5 * IQR)
upper_bound <- Q3 + (1.5 * IQR)
outliers <- which(dataset$Systolic < lower_bound | dataset$Systolic > upper_bound)

# Get the number of outliers
num_outliers <- length(outliers)
print(paste("Number of Systolic outliers:", num_outliers))
## [1] "Number of Systolic outliers: 0"
# Compute IRQ
Q1 <- quantile(dataset$Cholesterol, 0.25)
Q3 <- quantile(dataset$Cholesterol, 0.75)
IQR <- Q3 - Q1

# Identify outliers
lower_bound <- Q1 - (1.5 * IQR)
upper_bound <- Q3 + (1.5 * IQR)
outliers <- which(dataset$Cholesterol < lower_bound | dataset$Cholesterol > upper_bound)

# Get the number of outliers
num_outliers <- length(outliers)
print(paste("Number of Cholesterol outliers:", num_outliers))
## [1] "Number of Cholesterol outliers: 0"
# Compute IRQ
Q1 <- quantile(dataset$HDL, 0.25)
Q3 <- quantile(dataset$HDL, 0.75)
IQR <- Q3 - Q1

# Identify outliers
lower_bound <- Q1 - (1.5 * IQR)
upper_bound <- Q3 + (1.5 * IQR)
outliers <- which(dataset$HDL < lower_bound | dataset$HDL > upper_bound)

# Get the number of outliers
num_outliers <- length(outliers)
print(paste("Number of HDL outliers:", num_outliers))
## [1] "Number of HDL outliers: 0"

The result indicates that there are no outliers, but we will also use a box plot to ensure that there are no outliers.

boxplot(dataset[,c(6,7,8,9)], main="Boxplot with Outliers", col=c("lightblue","lightblue","lightblue","lightblue"))

By using the box plot we can see that there are no outliers in the data set.


3-Data reduction

In analyzing the dataset,The initial dataset provided a comprehensive and relevant set of information for the research objectives without the need for removal or condensation of variables.

used the findCorrelation function in caret library to outputs the index of variables we need to delete. targeting any pair with a correlation coefficient exceeding 0.75.

findCorrelation(cor(dataset), cutoff=0.75)
## integer(0)

In our case, the function finds out that no feature need to be deleted.


4-Data transformation

4.1 normalization

Data normalization is a preprocessing step that involves transforming numerical data within a dataset to a standard, uniform scale. This process ensures that all variables, regardless of their original units or scales, are brought into a consistent and comparable range. the following attributes were selected for normalization:(age, systolic, cholestrol, HDL)

normalize <- function(x)
{
  return ((x - min(x))/ (max(x)- min(x)) )
}

dataset$Age<-normalize(dataset$Age)
dataset$Systolic<-normalize(dataset$Systolic)
dataset$Cholesterol<-normalize(dataset$Cholesterol)
dataset$HDL<-normalize(dataset$HDL)

head(dataset)

we have successfully completed the data normalization. This process entailed scaling our numerical features to a standardized range, typically between 0 and 1.

4.2 Discretization

To make our dataset understandable and easily interpretable, especially when using tree-based classification methods, we transformed the continuous class label ‘Risk’ into specific, categorized risk levels.

These levels are delineated as:

Low risk (<5%), Borderline risk (5% to 7.4%), Intermediate risk (7.5% to 19.9%), and High risk (≥20%).

# Categorize 'Risk' into defined categories
dataset$Risk <- cut(
  dataset$Risk, 
  breaks = c(-Inf, 5, 7.4, 19.9, Inf),
  labels = c("Low risk", "Borderline risk", "Intermediate risk", "High risk"),
  right = FALSE,
  include.lowest = TRUE
)

our dataset after Discretization:

head(dataset)

5- Feature selection

Feature selection is one of the most important task to boost performance of our machine learning model by removing irrelevant features the model will make decisions only using important features. we will use Recursive Feature Elimination (RFE), which is a widely used wrapper-type algorithm for selecting features that are most relevant in predicting the target variable ‘Risk’ in our case.

## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.22-2
# ensure results are repeatable
set.seed(7)

# Define RFE control parameters
ctrl <- rfeControl(functions=rfFuncs, method="cv", number=10)

# Execute RFE using dataset features 1-9 and "Risk" as the class lable
results <- rfe(dataset[,1:9], dataset$Risk, sizes=c(1:9), rfeControl=ctrl)

# Display RFE results
print(results)
## 
## Recursive feature selection
## 
## Outer resampling method: Cross-Validated (10 fold) 
## 
## Resampling performance over subset size:
## 
##  Variables Accuracy  Kappa AccuracySD KappaSD Selected
##          1   0.5832 0.3887    0.03859 0.05413         
##          2   0.5489 0.3401    0.03516 0.05332         
##          3   0.6230 0.4335    0.03123 0.04525         
##          4   0.6671 0.5073    0.04478 0.06397         
##          5   0.6770 0.5222    0.02512 0.03598         
##          6   0.7132 0.5739    0.03336 0.05041         
##          7   0.7821 0.6764    0.03986 0.05887         
##          8   0.7812 0.6748    0.03076 0.04539         
##          9   0.8009 0.7051    0.02630 0.03865        *
## 
## The top 5 variables (out of 9):
##    Age, Systolic, isDiabetic, isSmoker, isMale
plot(results, type=c("g", "o"))

The asterisk (*) in the column indicates the number of features recommended by RFE as yielding the best model according to the resampling results. it shows that when 9 variables are used, the model achieves the best accuracy of approximately 80% and a kappa value of 0.7.

The graphical representation of feature importance :

The “Mean Decrease Gini” score tells us how crucial a feature is for making accurate predictions in a Random Forest model. A higher score means the feature is more valuable in deciding how to classify the data correctly, helping the model make better decisions.

# Setting seed for reproducibility
set.seed(123)

# Fit a random forest model
rf_model <- randomForest(Risk ~ ., data = dataset)
var_imp <- importance(rf_model)
var_imp_df <- data.frame(variables = row.names(var_imp), var_imp)

# Sorting variables based on importance
var_imp_df <- var_imp_df[order(var_imp_df$MeanDecreaseGini, decreasing = TRUE),]

# Plotting variable importance using ggplot2
ggplot(var_imp_df, aes(x = reorder(variables, MeanDecreaseGini), y = MeanDecreaseGini)) +geom_col() +
  coord_flip() +
  labs(title = "Feature Importance",
       x = "Features",
      y = "Importance (Mean Decrease in Gini)")

The graph shows that ‘Age’ and ‘Systolic’ are key variables influencing our model’s predictions of ‘Risk’. while variables like isHypertensive, isBlack were found to have the least impact on the model’s predictive capability.

Overall, we think it’s a good practice to make use of all our features as recommended by RFE, particularly when we are dealing with a modest number, to avoid potential overfitting.we


phase-3

balancing data

Balancing data is crucial for improving the performance and fairness of machine learning models. When data are imbalanced, with one class significantly outnumbering the others, models tend to bias towards the majority class, leading to poor predictive accuracy for minority classes.

Before balancing our data:

# Calculate class distribution
class_distribution <- table(dataset$Risk)
# Create a bar plot
barplot(class_distribution, 
        main = "Class Distribution for Risk",
        xlab = "Risk Level",
        ylab = "Count",
        names.arg = levels(dataset$Risk))

After balancing our data:

library(ROSE)
## Loaded ROSE 0.0-4
balanced_data <- upSample(dataset[, 1:9], dataset$Risk, yname = "Risk")
# Plot the distribution of the "Risk" classes
plot(balanced_data$Risk)

# Check the proportion and count of "Risk" classes
prop_table <- prop.table(table(balanced_data$Risk))
count_table <- table(balanced_data$Risk)

After balancing our data, the model becomes more capable of providing accurate predictions, ensuring a fair evaluation of their performance.

6- Classification

Classification analysis is a fundamental aspect of machine learning, focusing on categorizing data into distinct classes. In our study, we aim to build predictive models that efficiently assign predefined labels to new instances based on their features. To enhance the robustness of our models, we have divided the dataset into three sets: training, validation, and testing. By employing different proportions of training data—60%, 70%, and 80%—we seek to evaluate and compare the models’ performances. This approach ensures a comprehensive understanding of model behavior under varying training scenarios, guiding us to select the most effective model for our specific dataset.

-Decision tree using Gain ratio (C4.5):

Gain ratio is a metric that assesses the quality of a split within decision tree algorithms. to evaluate the quality of a split based on the information gain and the intrinsic information of a feature.we have implemented the Gain Ratio (C4.5) algorithm and the J48 function from the RWeka package. This algorithm partitions our data into training and testing sets, builds a J48 decision tree on the training data,

1-partition the data into ( 60% training, 40% testing):

# Load the RWeka package
library(RWeka)
set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.60 , 0.40))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]

# Define the formula
myFormula <- Risk ~ .

# Build the J48 decision tree on the training data
C45Fit <- J48(myFormula, data = trainData)

# Create a table to compare predicted vs. actual values on the training data
table(predict(C45Fit), trainData$Risk)
##                    
##                     Low risk Borderline risk Intermediate risk High risk
##   Low risk               240               1                 3         1
##   Borderline risk          6             217                 5         1
##   Intermediate risk        0               0               225        13
##   High risk                0               3                17       227
# Print a summary of the J48 model
print(C45Fit)
## J48 pruned tree
## ------------------
## 
## Age <= 0.564103
## |   HDL <= 0.225
## |   |   Systolic <= 0.545455
## |   |   |   isHypertensive <= 0
## |   |   |   |   Age <= 0.025641: Low risk (6.0)
## |   |   |   |   Age > 0.025641
## |   |   |   |   |   HDL <= 0.0125: Intermediate risk (6.0)
## |   |   |   |   |   HDL > 0.0125
## |   |   |   |   |   |   Cholesterol <= 0.2
## |   |   |   |   |   |   |   Systolic <= 0.290909: Low risk (3.0)
## |   |   |   |   |   |   |   Systolic > 0.290909: Intermediate risk (3.0)
## |   |   |   |   |   |   Cholesterol > 0.2
## |   |   |   |   |   |   |   Age <= 0.128205
## |   |   |   |   |   |   |   |   isBlack <= 0: Borderline risk (2.0)
## |   |   |   |   |   |   |   |   isBlack > 0
## |   |   |   |   |   |   |   |   |   Age <= 0.051282: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   Age > 0.051282: Low risk (2.0)
## |   |   |   |   |   |   |   Age > 0.128205
## |   |   |   |   |   |   |   |   Age <= 0.435897: Borderline risk (18.0)
## |   |   |   |   |   |   |   |   Age > 0.435897
## |   |   |   |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   |   |   |   Age <= 0.461538: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   |   |   Age > 0.461538
## |   |   |   |   |   |   |   |   |   |   |   Systolic <= 0.190909: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   |   |   Systolic > 0.190909: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   |   isDiabetic > 0: Intermediate risk (2.0)
## |   |   |   isHypertensive > 0
## |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   Systolic <= 0.309091
## |   |   |   |   |   |   Systolic <= 0.190909
## |   |   |   |   |   |   |   isMale <= 0: Low risk (6.0)
## |   |   |   |   |   |   |   isMale > 0: Intermediate risk (3.0)
## |   |   |   |   |   |   Systolic > 0.190909: Borderline risk (5.0/1.0)
## |   |   |   |   |   Systolic > 0.309091: Intermediate risk (10.0)
## |   |   |   |   isDiabetic > 0
## |   |   |   |   |   isMale <= 0: Intermediate risk (11.0/1.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   isBlack <= 0: High risk (3.0/1.0)
## |   |   |   |   |   |   isBlack > 0: Intermediate risk (4.0/1.0)
## |   |   Systolic > 0.545455
## |   |   |   Cholesterol <= 0.014286: Borderline risk (5.0/1.0)
## |   |   |   Cholesterol > 0.014286
## |   |   |   |   isSmoker <= 0
## |   |   |   |   |   Systolic <= 0.681818: High risk (3.0)
## |   |   |   |   |   Systolic > 0.681818
## |   |   |   |   |   |   Age <= 0.461538: Intermediate risk (9.0)
## |   |   |   |   |   |   Age > 0.461538: High risk (3.0/1.0)
## |   |   |   |   isSmoker > 0: High risk (29.0/6.0)
## |   HDL > 0.225
## |   |   Age <= 0.282051
## |   |   |   isBlack <= 0
## |   |   |   |   Cholesterol <= 0.557143
## |   |   |   |   |   Systolic <= 0.718182: Low risk (78.0)
## |   |   |   |   |   Systolic > 0.718182
## |   |   |   |   |   |   isDiabetic <= 0: Low risk (18.0/1.0)
## |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   HDL <= 0.6375
## |   |   |   |   |   |   |   |   Systolic <= 0.909091: Low risk (5.0)
## |   |   |   |   |   |   |   |   Systolic > 0.909091: Intermediate risk (2.0)
## |   |   |   |   |   |   |   HDL > 0.6375: Borderline risk (11.0)
## |   |   |   |   Cholesterol > 0.557143
## |   |   |   |   |   Systolic <= 0.163636: Low risk (5.0)
## |   |   |   |   |   Systolic > 0.163636
## |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   Age <= 0.230769: Low risk (15.0)
## |   |   |   |   |   |   |   Age > 0.230769: Borderline risk (8.0/1.0)
## |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   HDL <= 0.7375: Borderline risk (33.0/4.0)
## |   |   |   |   |   |   |   HDL > 0.7375: Low risk (8.0/1.0)
## |   |   |   isBlack > 0
## |   |   |   |   Systolic <= 0.536364
## |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   Cholesterol <= 0.828571: Low risk (30.0/1.0)
## |   |   |   |   |   |   Cholesterol > 0.828571: Borderline risk (2.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   |   isHypertensive <= 0: Low risk (9.0)
## |   |   |   |   |   |   |   |   isHypertensive > 0: Borderline risk (6.0/1.0)
## |   |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   |   Age <= 0.179487: Borderline risk (12.0/1.0)
## |   |   |   |   |   |   |   |   Age > 0.179487: Intermediate risk (2.0)
## |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   Systolic <= 0.072727: Low risk (4.0)
## |   |   |   |   |   |   |   Systolic > 0.072727: Intermediate risk (9.0/1.0)
## |   |   |   |   Systolic > 0.536364
## |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   Age <= 0.205128
## |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   Age <= 0.128205: Borderline risk (5.0)
## |   |   |   |   |   |   |   |   Age > 0.128205: Low risk (6.0/1.0)
## |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   Cholesterol <= 0.685714: Intermediate risk (5.0/1.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.685714: Borderline risk (8.0)
## |   |   |   |   |   |   Age > 0.205128
## |   |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   |   Age <= 0.25641: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   Age > 0.25641: Low risk (2.0)
## |   |   |   |   |   |   |   isSmoker > 0: Intermediate risk (4.0)
## |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   Systolic <= 0.890909
## |   |   |   |   |   |   |   Age <= 0.076923
## |   |   |   |   |   |   |   |   HDL <= 0.5625: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   HDL > 0.5625: Borderline risk (7.0)
## |   |   |   |   |   |   |   Age > 0.076923
## |   |   |   |   |   |   |   |   Age <= 0.179487: Intermediate risk (7.0)
## |   |   |   |   |   |   |   |   Age > 0.179487
## |   |   |   |   |   |   |   |   |   isDiabetic <= 0: Intermediate risk (4.0/1.0)
## |   |   |   |   |   |   |   |   |   isDiabetic > 0: High risk (2.0)
## |   |   |   |   |   |   Systolic > 0.890909: High risk (7.0)
## |   |   Age > 0.282051
## |   |   |   Systolic <= 0.7
## |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   Age <= 0.487179
## |   |   |   |   |   |   |   Systolic <= 0.381818: Low risk (19.0)
## |   |   |   |   |   |   |   Systolic > 0.381818
## |   |   |   |   |   |   |   |   HDL <= 0.55: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   HDL > 0.55: Low risk (10.0/1.0)
## |   |   |   |   |   |   Age > 0.487179
## |   |   |   |   |   |   |   Cholesterol <= 0.3: Low risk (3.0)
## |   |   |   |   |   |   |   Cholesterol > 0.3
## |   |   |   |   |   |   |   |   Systolic <= 0.363636: Borderline risk (13.0)
## |   |   |   |   |   |   |   |   Systolic > 0.363636
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.414286: Borderline risk (2.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.414286: Intermediate risk (2.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   Systolic <= 0.663636
## |   |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   |   isSmoker <= 0: Low risk (5.0)
## |   |   |   |   |   |   |   |   isSmoker > 0: Intermediate risk (2.0)
## |   |   |   |   |   |   |   isHypertensive > 0: Intermediate risk (18.0)
## |   |   |   |   |   |   Systolic > 0.663636: Borderline risk (7.0)
## |   |   |   |   isDiabetic > 0
## |   |   |   |   |   Age <= 0.461538
## |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   |   |   HDL <= 0.675: Borderline risk (8.0/1.0)
## |   |   |   |   |   |   |   |   |   HDL > 0.675: Low risk (4.0)
## |   |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   |   Systolic <= 0.290909: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   Systolic > 0.290909: Intermediate risk (2.0)
## |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   Systolic <= 0.072727: Borderline risk (12.0)
## |   |   |   |   |   |   |   |   Systolic > 0.072727
## |   |   |   |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (5.0)
## |   |   |   |   |   |   |   |   |   isHypertensive > 0: Borderline risk (4.0)
## |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   |   Systolic <= 0.6
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.628571: Borderline risk (13.0/1.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.628571: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   Systolic > 0.6: High risk (2.0)
## |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   isMale <= 0: Intermediate risk (3.0/1.0)
## |   |   |   |   |   |   |   |   isMale > 0: High risk (2.0)
## |   |   |   |   |   Age > 0.461538
## |   |   |   |   |   |   Cholesterol <= 0.328571: Borderline risk (2.0)
## |   |   |   |   |   |   Cholesterol > 0.328571: Intermediate risk (19.0/1.0)
## |   |   |   Systolic > 0.7
## |   |   |   |   Systolic <= 0.9
## |   |   |   |   |   isSmoker <= 0: Intermediate risk (12.0)
## |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   Age <= 0.384615: Intermediate risk (6.0)
## |   |   |   |   |   |   Age > 0.384615: High risk (7.0/1.0)
## |   |   |   |   Systolic > 0.9
## |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   Systolic <= 0.936364: Borderline risk (7.0)
## |   |   |   |   |   |   Systolic > 0.936364: Intermediate risk (5.0/1.0)
## |   |   |   |   |   isDiabetic > 0: High risk (4.0)
## Age > 0.564103
## |   Systolic <= 0.5
## |   |   isDiabetic <= 0
## |   |   |   HDL <= 0.15
## |   |   |   |   Systolic <= 0.190909
## |   |   |   |   |   isMale <= 0: Low risk (2.0)
## |   |   |   |   |   isMale > 0: Intermediate risk (2.0)
## |   |   |   |   Systolic > 0.190909: High risk (9.0)
## |   |   |   HDL > 0.15
## |   |   |   |   Systolic <= 0.427273
## |   |   |   |   |   Cholesterol <= 0.7
## |   |   |   |   |   |   Systolic <= 0.290909
## |   |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   |   HDL <= 0.6: Intermediate risk (7.0)
## |   |   |   |   |   |   |   |   HDL > 0.6
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.371429: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.371429
## |   |   |   |   |   |   |   |   |   |   Systolic <= 0.054545: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   |   Systolic > 0.054545: Borderline risk (18.0)
## |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   Systolic <= 0.172727
## |   |   |   |   |   |   |   |   |   Age <= 0.692308: Low risk (3.0)
## |   |   |   |   |   |   |   |   |   Age > 0.692308: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   Systolic > 0.172727: Borderline risk (7.0/1.0)
## |   |   |   |   |   |   Systolic > 0.290909: Intermediate risk (15.0/1.0)
## |   |   |   |   |   Cholesterol > 0.7
## |   |   |   |   |   |   Age <= 0.897436: Intermediate risk (12.0)
## |   |   |   |   |   |   Age > 0.897436
## |   |   |   |   |   |   |   Systolic <= 0.209091: Intermediate risk (3.0/1.0)
## |   |   |   |   |   |   |   Systolic > 0.209091: High risk (3.0)
## |   |   |   |   Systolic > 0.427273
## |   |   |   |   |   Systolic <= 0.472727: High risk (5.0)
## |   |   |   |   |   Systolic > 0.472727: Borderline risk (5.0)
## |   |   isDiabetic > 0
## |   |   |   isSmoker <= 0
## |   |   |   |   Age <= 0.923077
## |   |   |   |   |   Systolic <= 0.318182: Intermediate risk (21.0/3.0)
## |   |   |   |   |   Systolic > 0.318182: High risk (8.0/1.0)
## |   |   |   |   Age > 0.923077: High risk (5.0)
## |   |   |   isSmoker > 0
## |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   isBlack <= 0
## |   |   |   |   |   |   Age <= 0.794872: Intermediate risk (4.0)
## |   |   |   |   |   |   Age > 0.794872: High risk (2.0)
## |   |   |   |   |   isBlack > 0: High risk (3.0)
## |   |   |   |   isHypertensive > 0: High risk (22.0)
## |   Systolic > 0.5: High risk (128.0/10.0)
## 
## Number of Leaves  :  110
## 
## Size of the tree :   219
# Plot the J48 decision tree
plot(C45Fit)

# Make predictions using the J48 model on the test data
testPred <- predict(C45Fit, newdata = testData)

# Create a confusion matrix
conf_matrix <- table(testPred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## testPred            Low risk Borderline risk Intermediate risk High risk
##   Low risk               126               4                 5         1
##   Borderline risk         16             165                24        12
##   Intermediate risk        6               0                87        27
##   High risk                3               7                31       115
# Calculate performance metrics
accuracy_G1 <- sum((diag(conf_matrix)) / sum(conf_matrix))
error_rate_G1 <-( 1 - accuracy_G1)
sensitivity_G1 <- conf_matrix[4, 4] / sum(conf_matrix[4, ])
specificity_G1 <- sum(diag(conf_matrix[-4, -4])) / sum(conf_matrix[-4, ])
precision_G1 <- conf_matrix[4, 4] / sum(conf_matrix[, 4])


# Display performance metrics
cat("Accuracy: ", accuracy_G1, "\n")
## Accuracy:  0.7837838
cat("Error Rate: ", error_rate_G1, "\n")
## Error Rate:  0.2162162
cat("Sensitivity (Recall): ", sensitivity_G1, "\n")
## Sensitivity (Recall):  0.7371795
cat("Specificity: ", specificity_G1, "\n")
## Specificity:  0.7991543
cat("Precision: ", precision_G1, "\n")
## Precision:  0.7419355

Analysis:

- The C4.5 decision tree, employing the gain ratio criterion, showcases robust performance on our dataset with an accuracy of 78.38%. Its ability to effectively capture complex relationships is reflected in the tree’s structure, consisting of 219 nodes and 110 leaves. Notably, the model demonstrates a balanced trade-off between sensitivity (73.72%) and specificity (79.92%), indicating its proficiency in correctly identifying positive and negative instances. With a precision of 74.19%, the model reliably makes accurate positive predictions.

2-partition the data into ( 70% training, 30% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.70 , 0.30))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]

  # Define the formula
myFormula <- Risk ~ .

# Build the J48 decision tree on the training data
C45Fit <- J48(myFormula, data = trainData )

# Create a table to compare predicted vs. actual values on the training data
table(predict(C45Fit), trainData$Risk)
##                    
##                     Low risk Borderline risk Intermediate risk High risk
##   Low risk               272               1                 5         1
##   Borderline risk          4             270                 6         4
##   Intermediate risk        5               0               265        12
##   High risk                0               0                14       273
# Print a summary of the J48 model
print(C45Fit)
## J48 pruned tree
## ------------------
## 
## Age <= 0.564103
## |   HDL <= 0.225
## |   |   Systolic <= 0.545455
## |   |   |   isDiabetic <= 0
## |   |   |   |   isSmoker <= 0
## |   |   |   |   |   Age <= 0.25641: Low risk (13.0)
## |   |   |   |   |   Age > 0.25641
## |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   Cholesterol <= 0.257143: Intermediate risk (2.0)
## |   |   |   |   |   |   |   Cholesterol > 0.257143: Borderline risk (12.0/1.0)
## |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   Systolic <= 0.081818: Low risk (3.0)
## |   |   |   |   |   |   |   Systolic > 0.081818
## |   |   |   |   |   |   |   |   Age <= 0.538462: Intermediate risk (5.0)
## |   |   |   |   |   |   |   |   Age > 0.538462: Borderline risk (2.0)
## |   |   |   |   isSmoker > 0
## |   |   |   |   |   isBlack <= 0
## |   |   |   |   |   |   Systolic <= 0.309091: Borderline risk (9.0/1.0)
## |   |   |   |   |   |   Systolic > 0.309091: Intermediate risk (2.0)
## |   |   |   |   |   isBlack > 0: Intermediate risk (13.0/1.0)
## |   |   |   isDiabetic > 0
## |   |   |   |   Age <= 0.410256
## |   |   |   |   |   Cholesterol <= 0.357143: Intermediate risk (8.0/1.0)
## |   |   |   |   |   Cholesterol > 0.357143
## |   |   |   |   |   |   isHypertensive <= 0: Borderline risk (17.0/1.0)
## |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   Cholesterol <= 0.685714
## |   |   |   |   |   |   |   |   isSmoker <= 0: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   isSmoker > 0: High risk (2.0)
## |   |   |   |   |   |   |   Cholesterol > 0.685714: Intermediate risk (4.0)
## |   |   |   |   Age > 0.410256
## |   |   |   |   |   Cholesterol <= 0.271429: Low risk (3.0/1.0)
## |   |   |   |   |   Cholesterol > 0.271429: Intermediate risk (12.0/1.0)
## |   |   Systolic > 0.545455
## |   |   |   Cholesterol <= 0.014286: Borderline risk (5.0/1.0)
## |   |   |   Cholesterol > 0.014286
## |   |   |   |   isSmoker <= 0
## |   |   |   |   |   isDiabetic <= 0: Intermediate risk (12.0/1.0)
## |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   isMale <= 0: Intermediate risk (4.0/1.0)
## |   |   |   |   |   |   isMale > 0: High risk (5.0)
## |   |   |   |   isSmoker > 0
## |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   Cholesterol <= 0.242857: Intermediate risk (2.0)
## |   |   |   |   |   |   Cholesterol > 0.242857: High risk (13.0/2.0)
## |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   HDL <= 0.2: High risk (17.0)
## |   |   |   |   |   |   HDL > 0.2: Intermediate risk (3.0/1.0)
## |   HDL > 0.225
## |   |   Age <= 0.282051
## |   |   |   Systolic <= 0.163636
## |   |   |   |   isBlack <= 0: Low risk (44.0)
## |   |   |   |   isBlack > 0
## |   |   |   |   |   isMale <= 0: Low risk (9.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   Systolic <= 0.090909: Low risk (6.0)
## |   |   |   |   |   |   Systolic > 0.090909: Intermediate risk (4.0)
## |   |   |   Systolic > 0.163636
## |   |   |   |   isBlack <= 0
## |   |   |   |   |   Cholesterol <= 0.242857: Low risk (38.0/1.0)
## |   |   |   |   |   Cholesterol > 0.242857
## |   |   |   |   |   |   HDL <= 0.8125
## |   |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   |   Age <= 0.230769: Low risk (31.0)
## |   |   |   |   |   |   |   |   Age > 0.230769
## |   |   |   |   |   |   |   |   |   isMale <= 0: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   isMale > 0: Borderline risk (12.0)
## |   |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   |   Systolic <= 0.309091
## |   |   |   |   |   |   |   |   |   Systolic <= 0.218182: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   |   Systolic > 0.218182: Low risk (9.0)
## |   |   |   |   |   |   |   |   Systolic > 0.309091
## |   |   |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   |   |   isHypertensive <= 0: Borderline risk (17.0/1.0)
## |   |   |   |   |   |   |   |   |   |   isHypertensive > 0: Low risk (4.0)
## |   |   |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   |   |   Systolic <= 0.9
## |   |   |   |   |   |   |   |   |   |   |   HDL <= 0.4625
## |   |   |   |   |   |   |   |   |   |   |   |   isDiabetic <= 0: Borderline risk (8.0/1.0)
## |   |   |   |   |   |   |   |   |   |   |   |   isDiabetic > 0: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   |   |   HDL > 0.4625: Borderline risk (23.0)
## |   |   |   |   |   |   |   |   |   |   Systolic > 0.9: Intermediate risk (2.0)
## |   |   |   |   |   |   HDL > 0.8125
## |   |   |   |   |   |   |   isMale <= 0: Low risk (17.0)
## |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   Age <= 0.076923: Low risk (3.0)
## |   |   |   |   |   |   |   |   Age > 0.076923: Intermediate risk (2.0)
## |   |   |   |   isBlack > 0
## |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   Systolic <= 0.554545
## |   |   |   |   |   |   |   Systolic <= 0.245455
## |   |   |   |   |   |   |   |   isMale <= 0: Low risk (2.0)
## |   |   |   |   |   |   |   |   isMale > 0: Borderline risk (15.0/1.0)
## |   |   |   |   |   |   |   Systolic > 0.245455
## |   |   |   |   |   |   |   |   isHypertensive <= 0: Low risk (20.0/2.0)
## |   |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   |   HDL <= 0.4625: Borderline risk (5.0/1.0)
## |   |   |   |   |   |   |   |   |   HDL > 0.4625: Low risk (5.0)
## |   |   |   |   |   |   Systolic > 0.554545
## |   |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   |   Age <= 0.153846: Borderline risk (5.0/1.0)
## |   |   |   |   |   |   |   |   |   Age > 0.153846: Low risk (6.0)
## |   |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.7
## |   |   |   |   |   |   |   |   |   |   Systolic <= 0.718182: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   |   |   Systolic > 0.718182: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.7: Borderline risk (10.0)
## |   |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   |   Age <= 0: Borderline risk (5.0/1.0)
## |   |   |   |   |   |   |   |   Age > 0
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.071429: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.071429
## |   |   |   |   |   |   |   |   |   |   Cholesterol <= 0.871429: Intermediate risk (12.0)
## |   |   |   |   |   |   |   |   |   |   Cholesterol > 0.871429: High risk (3.0/1.0)
## |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   Systolic <= 0.309091
## |   |   |   |   |   |   |   HDL <= 0.4375
## |   |   |   |   |   |   |   |   Age <= 0.205128: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   Age > 0.205128: Borderline risk (3.0)
## |   |   |   |   |   |   |   HDL > 0.4375: Low risk (6.0)
## |   |   |   |   |   |   Systolic > 0.309091
## |   |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   |   Cholesterol <= 0.314286: Borderline risk (5.0/1.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.314286: Intermediate risk (10.0/2.0)
## |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   Age <= 0.153846
## |   |   |   |   |   |   |   |   |   Systolic <= 0.881818: Intermediate risk (9.0/1.0)
## |   |   |   |   |   |   |   |   |   Systolic > 0.881818: High risk (2.0)
## |   |   |   |   |   |   |   |   Age > 0.153846: High risk (6.0)
## |   |   Age > 0.282051
## |   |   |   Systolic <= 0.254545
## |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   isHypertensive <= 0: Low risk (20.0)
## |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   Cholesterol <= 0.385714: Low risk (5.0)
## |   |   |   |   |   |   |   Cholesterol > 0.385714: Borderline risk (15.0/1.0)
## |   |   |   |   |   |   isMale > 0: Intermediate risk (6.0/1.0)
## |   |   |   |   isDiabetic > 0
## |   |   |   |   |   Age <= 0.435897
## |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   Systolic <= 0.2: Borderline risk (21.0/1.0)
## |   |   |   |   |   |   |   Systolic > 0.2: Low risk (3.0/1.0)
## |   |   |   |   |   |   isHypertensive > 0: Low risk (3.0)
## |   |   |   |   |   Age > 0.435897: Intermediate risk (10.0)
## |   |   |   Systolic > 0.254545
## |   |   |   |   isMale <= 0
## |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   Age <= 0.384615
## |   |   |   |   |   |   |   HDL <= 0.5125: Intermediate risk (2.0)
## |   |   |   |   |   |   |   HDL > 0.5125: Low risk (12.0)
## |   |   |   |   |   |   Age > 0.384615
## |   |   |   |   |   |   |   Cholesterol <= 0.814286
## |   |   |   |   |   |   |   |   Systolic <= 0.936364
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.414286
## |   |   |   |   |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   |   |   |   |   Age <= 0.512821: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   |   |   Age > 0.512821: Borderline risk (8.0)
## |   |   |   |   |   |   |   |   |   |   isSmoker > 0: Borderline risk (7.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.414286
## |   |   |   |   |   |   |   |   |   |   Age <= 0.512821: Borderline risk (5.0)
## |   |   |   |   |   |   |   |   |   |   Age > 0.512821: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   Systolic > 0.936364: Intermediate risk (2.0)
## |   |   |   |   |   |   |   Cholesterol > 0.814286: Low risk (3.0/1.0)
## |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   Systolic <= 0.609091
## |   |   |   |   |   |   |   |   isBlack <= 0
## |   |   |   |   |   |   |   |   |   Age <= 0.333333: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   Age > 0.333333: Borderline risk (13.0)
## |   |   |   |   |   |   |   |   isBlack > 0
## |   |   |   |   |   |   |   |   |   isSmoker <= 0: Borderline risk (4.0)
## |   |   |   |   |   |   |   |   |   isSmoker > 0: Intermediate risk (3.0)
## |   |   |   |   |   |   |   Systolic > 0.609091
## |   |   |   |   |   |   |   |   isBlack <= 0: Intermediate risk (6.0/1.0)
## |   |   |   |   |   |   |   |   isBlack > 0: High risk (2.0)
## |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   Systolic <= 0.827273
## |   |   |   |   |   |   |   |   isBlack <= 0: Intermediate risk (9.0)
## |   |   |   |   |   |   |   |   isBlack > 0
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.814286: Intermediate risk (7.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.814286: High risk (2.0)
## |   |   |   |   |   |   |   Systolic > 0.827273: High risk (3.0)
## |   |   |   |   isMale > 0
## |   |   |   |   |   Cholesterol <= 0.914286
## |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   isDiabetic <= 0: Intermediate risk (18.0)
## |   |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (6.0/1.0)
## |   |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   |   Age <= 0.435897: Borderline risk (4.0)
## |   |   |   |   |   |   |   |   |   Age > 0.435897: Intermediate risk (2.0)
## |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (7.0)
## |   |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   |   Systolic <= 0.690909: Intermediate risk (7.0)
## |   |   |   |   |   |   |   |   |   Systolic > 0.690909: High risk (4.0)
## |   |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   |   Cholesterol <= 0.128571: Intermediate risk (4.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.128571: High risk (11.0/1.0)
## |   |   |   |   |   Cholesterol > 0.914286
## |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (2.0)
## |   |   |   |   |   |   isHypertensive > 0: Borderline risk (7.0)
## Age > 0.564103
## |   Systolic <= 0.5
## |   |   isDiabetic <= 0
## |   |   |   HDL <= 0.15
## |   |   |   |   Systolic <= 0.190909
## |   |   |   |   |   isMale <= 0: Low risk (2.0)
## |   |   |   |   |   isMale > 0: Intermediate risk (2.0)
## |   |   |   |   Systolic > 0.190909: High risk (9.0)
## |   |   |   HDL > 0.15
## |   |   |   |   Age <= 0.692308
## |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   Systolic <= 0.172727: Low risk (4.0/1.0)
## |   |   |   |   |   |   Systolic > 0.172727
## |   |   |   |   |   |   |   Age <= 0.589744: Intermediate risk (3.0/1.0)
## |   |   |   |   |   |   |   Age > 0.589744: Borderline risk (22.0)
## |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   Age <= 0.589744: Borderline risk (3.0)
## |   |   |   |   |   |   Age > 0.589744: Intermediate risk (9.0/1.0)
## |   |   |   |   Age > 0.692308
## |   |   |   |   |   HDL <= 0.975
## |   |   |   |   |   |   Systolic <= 0.427273
## |   |   |   |   |   |   |   Cholesterol <= 0.057143
## |   |   |   |   |   |   |   |   Cholesterol <= 0.028571: Intermediate risk (5.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.028571: Borderline risk (4.0)
## |   |   |   |   |   |   |   Cholesterol > 0.057143
## |   |   |   |   |   |   |   |   isSmoker <= 0: Intermediate risk (25.0/2.0)
## |   |   |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   |   |   Age <= 0.769231: Intermediate risk (4.0)
## |   |   |   |   |   |   |   |   |   Age > 0.769231
## |   |   |   |   |   |   |   |   |   |   Systolic <= 0.072727: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   |   Systolic > 0.072727: High risk (4.0)
## |   |   |   |   |   |   Systolic > 0.427273: High risk (5.0)
## |   |   |   |   |   HDL > 0.975: Borderline risk (5.0)
## |   |   isDiabetic > 0
## |   |   |   isSmoker <= 0
## |   |   |   |   Systolic <= 0.318182
## |   |   |   |   |   Age <= 0.820513: Intermediate risk (18.0/1.0)
## |   |   |   |   |   Age > 0.820513
## |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   Age <= 0.948718: Intermediate risk (4.0)
## |   |   |   |   |   |   |   Age > 0.948718: High risk (4.0/1.0)
## |   |   |   |   |   |   isHypertensive > 0: High risk (3.0)
## |   |   |   |   Systolic > 0.318182: High risk (10.0/1.0)
## |   |   |   isSmoker > 0
## |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   isBlack <= 0
## |   |   |   |   |   |   Age <= 0.794872: Intermediate risk (4.0)
## |   |   |   |   |   |   Age > 0.794872: High risk (2.0)
## |   |   |   |   |   isBlack > 0: High risk (4.0)
## |   |   |   |   isHypertensive > 0: High risk (28.0)
## |   Systolic > 0.5
## |   |   Age <= 0.589744
## |   |   |   isDiabetic <= 0: Borderline risk (4.0/1.0)
## |   |   |   isDiabetic > 0: High risk (7.0/1.0)
## |   |   Age > 0.589744: High risk (141.0/7.0)
## 
## Number of Leaves  :  131
## 
## Size of the tree :   261
# Plot the J48 decision tree
plot(C45Fit)

#
# Make predictions using the J48 model on the test data
testPred <- predict(C45Fit, newdata = testData)

# Create a confusion matrix
conf_matrix <- table(testPred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## testPred            Low risk Borderline risk Intermediate risk High risk
##   Low risk                94               2                 9         2
##   Borderline risk         10             124                10         4
##   Intermediate risk       12               0                66        23
##   High risk                0               0                22        78
# Calculate performance metrics
accuracy_G2 <- sum((diag(conf_matrix)) / sum(conf_matrix))
error_rate_G2 <-( 1 - accuracy_G2)
sensitivity_G2 <- conf_matrix[4, 4] / sum(conf_matrix[4, ])
specificity_G2 <- sum(diag(conf_matrix[-4, -4])) / sum(conf_matrix[-4, ])
precision_G2 <- conf_matrix[4, 4] / sum(conf_matrix[, 4])


# Display performance metrics
cat("Accuracy: ", accuracy_G2, "\n")
## Accuracy:  0.7938596
cat("Error Rate: ", error_rate_G2, "\n")
## Error Rate:  0.2061404
cat("Sensitivity (Recall): ", sensitivity_G2, "\n")
## Sensitivity (Recall):  0.78
cat("Specificity: ", specificity_G2, "\n")
## Specificity:  0.7977528
cat("Precision: ", precision_G2, "\n")
## Precision:  0.728972

Analysis:

The C4.5 decision tree, employing the gain ratio criterion, exhibits strong predictive accuracy with an impressive 79.39%. Characterized by 261 nodes and 131 leaves, the tree’s depth allows it to capture intricate patterns within the data. Notably, the model strikes a balance between sensitivity (78%) and specificity (79.78%), showcasing its ability to effectively identify positive and negative instances. With a precision of 72.90%, the model demonstrates accuracy in positive predictions.

3-partition the data into ( 80% training, 20% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.80 , 0.20))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]

 # Define the formula
myFormula <- Risk ~ .

# Build the J48 decision tree on the training data
C45Fit <- J48(myFormula, data = trainData)

# Create a table to compare predicted vs. actual values on the training data
table(predict(C45Fit), trainData$Risk)
##                    
##                     Low risk Borderline risk Intermediate risk High risk
##   Low risk               317               0                 7         1
##   Borderline risk          3             304                 6         0
##   Intermediate risk        2               0               301        21
##   High risk                0               1                10       299
# Print a summary of the J48 model
print(C45Fit)
## J48 pruned tree
## ------------------
## 
## Age <= 0.564103
## |   Age <= 0.333333
## |   |   HDL <= 0.25
## |   |   |   Systolic <= 0.545455
## |   |   |   |   isSmoker <= 0
## |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   Systolic <= 0.309091: Low risk (14.0)
## |   |   |   |   |   |   Systolic > 0.309091
## |   |   |   |   |   |   |   Age <= 0.076923: Low risk (5.0)
## |   |   |   |   |   |   |   Age > 0.076923: Borderline risk (11.0)
## |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   isMale <= 0: Intermediate risk (3.0)
## |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   Cholesterol <= 0.457143: Low risk (3.0/1.0)
## |   |   |   |   |   |   |   Cholesterol > 0.457143
## |   |   |   |   |   |   |   |   isHypertensive <= 0: Borderline risk (5.0)
## |   |   |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   |   |   isBlack <= 0: Borderline risk (4.0)
## |   |   |   |   |   |   |   |   |   isBlack > 0: Intermediate risk (2.0)
## |   |   |   |   isSmoker > 0
## |   |   |   |   |   Systolic <= 0.327273
## |   |   |   |   |   |   Age <= 0: Low risk (3.0/1.0)
## |   |   |   |   |   |   Age > 0
## |   |   |   |   |   |   |   Age <= 0.205128
## |   |   |   |   |   |   |   |   HDL <= 0.0125: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   HDL > 0.0125
## |   |   |   |   |   |   |   |   |   isDiabetic <= 0: Borderline risk (12.0)
## |   |   |   |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   |   |   |   Systolic <= 0.1: Borderline risk (6.0)
## |   |   |   |   |   |   |   |   |   |   Systolic > 0.1: Intermediate risk (2.0)
## |   |   |   |   |   |   |   Age > 0.205128: Intermediate risk (3.0/1.0)
## |   |   |   |   |   Systolic > 0.327273
## |   |   |   |   |   |   Systolic <= 0.372727: Low risk (3.0/1.0)
## |   |   |   |   |   |   Systolic > 0.372727: Intermediate risk (9.0)
## |   |   |   Systolic > 0.545455
## |   |   |   |   Systolic <= 0.618182
## |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   isHypertensive <= 0: Borderline risk (10.0)
## |   |   |   |   |   |   isHypertensive > 0: Intermediate risk (2.0/1.0)
## |   |   |   |   |   isDiabetic > 0: High risk (6.0/1.0)
## |   |   |   |   Systolic > 0.618182
## |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   Age <= 0.025641: Low risk (2.0)
## |   |   |   |   |   |   Age > 0.025641
## |   |   |   |   |   |   |   isBlack <= 0: Intermediate risk (5.0)
## |   |   |   |   |   |   |   isBlack > 0: High risk (4.0/1.0)
## |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   isBlack <= 0
## |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   Cholesterol <= 0.485714: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.485714: High risk (2.0)
## |   |   |   |   |   |   |   isMale > 0: High risk (4.0)
## |   |   |   |   |   |   isBlack > 0: High risk (14.0/1.0)
## |   |   HDL > 0.25
## |   |   |   isBlack <= 0
## |   |   |   |   isSmoker <= 0
## |   |   |   |   |   Age <= 0.230769: Low risk (82.0)
## |   |   |   |   |   Age > 0.230769
## |   |   |   |   |   |   isDiabetic <= 0: Low risk (15.0)
## |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   isMale <= 0: Low risk (6.0)
## |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   Systolic <= 0.145455: Low risk (2.0)
## |   |   |   |   |   |   |   |   Systolic > 0.145455: Borderline risk (14.0)
## |   |   |   |   isSmoker > 0
## |   |   |   |   |   HDL <= 0.8125
## |   |   |   |   |   |   Systolic <= 0.309091
## |   |   |   |   |   |   |   Age <= 0.179487: Low risk (20.0)
## |   |   |   |   |   |   |   Age > 0.179487
## |   |   |   |   |   |   |   |   isDiabetic <= 0: Low risk (2.0)
## |   |   |   |   |   |   |   |   isDiabetic > 0: Borderline risk (6.0)
## |   |   |   |   |   |   Systolic > 0.309091
## |   |   |   |   |   |   |   Cholesterol <= 0.228571
## |   |   |   |   |   |   |   |   isMale <= 0: Low risk (6.0)
## |   |   |   |   |   |   |   |   isMale > 0: Intermediate risk (3.0/1.0)
## |   |   |   |   |   |   |   Cholesterol > 0.228571
## |   |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   |   isHypertensive <= 0: Borderline risk (20.0/1.0)
## |   |   |   |   |   |   |   |   |   isHypertensive > 0: Low risk (5.0)
## |   |   |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   |   |   HDL <= 0.55
## |   |   |   |   |   |   |   |   |   |   Cholesterol <= 0.871429: Intermediate risk (5.0)
## |   |   |   |   |   |   |   |   |   |   Cholesterol > 0.871429: Borderline risk (3.0)
## |   |   |   |   |   |   |   |   |   HDL > 0.55: Borderline risk (27.0)
## |   |   |   |   |   HDL > 0.8125
## |   |   |   |   |   |   Cholesterol <= 0.742857: Low risk (29.0)
## |   |   |   |   |   |   Cholesterol > 0.742857
## |   |   |   |   |   |   |   isMale <= 0: Low risk (3.0)
## |   |   |   |   |   |   |   isMale > 0: Intermediate risk (2.0)
## |   |   |   isBlack > 0
## |   |   |   |   Systolic <= 0.536364
## |   |   |   |   |   Cholesterol <= 0.828571
## |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   |   Cholesterol <= 0.785714: Low risk (29.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.785714
## |   |   |   |   |   |   |   |   |   Age <= 0.230769: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   Age > 0.230769: Borderline risk (2.0)
## |   |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   |   Systolic <= 0.327273: Low risk (11.0)
## |   |   |   |   |   |   |   |   Systolic > 0.327273
## |   |   |   |   |   |   |   |   |   Age <= 0.076923: Low risk (2.0)
## |   |   |   |   |   |   |   |   |   Age > 0.076923: Intermediate risk (3.0)
## |   |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   |   Systolic <= 0.090909: Low risk (9.0)
## |   |   |   |   |   |   |   Systolic > 0.090909
## |   |   |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   |   |   Age <= 0.282051
## |   |   |   |   |   |   |   |   |   |   Systolic <= 0.254545: Borderline risk (17.0/1.0)
## |   |   |   |   |   |   |   |   |   |   Systolic > 0.254545: Low risk (9.0/1.0)
## |   |   |   |   |   |   |   |   |   Age > 0.282051: Intermediate risk (5.0)
## |   |   |   |   |   |   |   |   isDiabetic > 0: Intermediate risk (7.0)
## |   |   |   |   |   Cholesterol > 0.828571
## |   |   |   |   |   |   isHypertensive <= 0: Borderline risk (13.0/1.0)
## |   |   |   |   |   |   isHypertensive > 0: Intermediate risk (4.0/1.0)
## |   |   |   |   Systolic > 0.536364
## |   |   |   |   |   Age <= 0.102564
## |   |   |   |   |   |   HDL <= 0.625
## |   |   |   |   |   |   |   Systolic <= 0.872727: Intermediate risk (9.0/1.0)
## |   |   |   |   |   |   |   Systolic > 0.872727: High risk (5.0/1.0)
## |   |   |   |   |   |   HDL > 0.625
## |   |   |   |   |   |   |   isDiabetic <= 0: Borderline risk (15.0/1.0)
## |   |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   |   isSmoker <= 0: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   isSmoker > 0: Borderline risk (5.0)
## |   |   |   |   |   Age > 0.102564
## |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.642857: Low risk (9.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.642857: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   isMale > 0: Intermediate risk (2.0)
## |   |   |   |   |   |   |   isHypertensive > 0: Intermediate risk (8.0/1.0)
## |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   isSmoker <= 0: Intermediate risk (8.0/1.0)
## |   |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   isHypertensive > 0: High risk (2.0)
## |   |   |   |   |   |   |   |   isMale > 0: High risk (6.0)
## |   Age > 0.333333
## |   |   Systolic <= 0.254545
## |   |   |   Cholesterol <= 0.828571
## |   |   |   |   HDL <= 0.825
## |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   Systolic <= 0.090909
## |   |   |   |   |   |   |   isDiabetic <= 0: Low risk (8.0)
## |   |   |   |   |   |   |   isDiabetic > 0: Intermediate risk (2.0)
## |   |   |   |   |   |   Systolic > 0.090909
## |   |   |   |   |   |   |   Cholesterol <= 0.228571
## |   |   |   |   |   |   |   |   Systolic <= 0.190909: Low risk (5.0)
## |   |   |   |   |   |   |   |   Systolic > 0.190909: Borderline risk (3.0)
## |   |   |   |   |   |   |   Cholesterol > 0.228571
## |   |   |   |   |   |   |   |   Systolic <= 0.218182
## |   |   |   |   |   |   |   |   |   HDL <= 0.475
## |   |   |   |   |   |   |   |   |   |   Age <= 0.410256: Borderline risk (5.0/1.0)
## |   |   |   |   |   |   |   |   |   |   Age > 0.410256: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   HDL > 0.475: Borderline risk (20.0)
## |   |   |   |   |   |   |   |   Systolic > 0.218182: Low risk (3.0/1.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   HDL <= 0.2125
## |   |   |   |   |   |   |   |   Cholesterol <= 0.8: Intermediate risk (6.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.8: Borderline risk (3.0)
## |   |   |   |   |   |   |   HDL > 0.2125
## |   |   |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   |   |   HDL <= 0.2375: Borderline risk (5.0)
## |   |   |   |   |   |   |   |   |   HDL > 0.2375: Low risk (4.0)
## |   |   |   |   |   |   |   |   isDiabetic > 0: Borderline risk (9.0/1.0)
## |   |   |   |   |   |   isHypertensive > 0: Intermediate risk (8.0/1.0)
## |   |   |   |   HDL > 0.825
## |   |   |   |   |   Age <= 0.461538: Low risk (9.0)
## |   |   |   |   |   Age > 0.461538: Intermediate risk (2.0)
## |   |   |   Cholesterol > 0.828571
## |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   Age <= 0.461538: Low risk (3.0)
## |   |   |   |   |   Age > 0.461538: Intermediate risk (2.0)
## |   |   |   |   isDiabetic > 0: Intermediate risk (9.0)
## |   |   Systolic > 0.254545
## |   |   |   HDL <= 0.2
## |   |   |   |   isSmoker <= 0
## |   |   |   |   |   isMale <= 0: Intermediate risk (12.0/1.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   isDiabetic <= 0: Intermediate risk (7.0/1.0)
## |   |   |   |   |   |   isDiabetic > 0: High risk (4.0)
## |   |   |   |   isSmoker > 0
## |   |   |   |   |   Systolic <= 0.354545: Intermediate risk (3.0)
## |   |   |   |   |   Systolic > 0.354545: High risk (13.0/2.0)
## |   |   |   HDL > 0.2
## |   |   |   |   isMale <= 0
## |   |   |   |   |   Cholesterol <= 0.814286
## |   |   |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   |   |   HDL <= 0.95
## |   |   |   |   |   |   |   |   Cholesterol <= 0.414286
## |   |   |   |   |   |   |   |   |   isBlack <= 0: Borderline risk (22.0)
## |   |   |   |   |   |   |   |   |   isBlack > 0
## |   |   |   |   |   |   |   |   |   |   Cholesterol <= 0.328571
## |   |   |   |   |   |   |   |   |   |   |   Age <= 0.435897: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   |   |   Age > 0.435897: Low risk (2.0/1.0)
## |   |   |   |   |   |   |   |   |   |   Cholesterol > 0.328571: Borderline risk (4.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.414286
## |   |   |   |   |   |   |   |   |   Systolic <= 0.709091
## |   |   |   |   |   |   |   |   |   |   Age <= 0.512821: Borderline risk (9.0/1.0)
## |   |   |   |   |   |   |   |   |   |   Age > 0.512821: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   Systolic > 0.709091: Intermediate risk (6.0)
## |   |   |   |   |   |   |   HDL > 0.95: Low risk (2.0)
## |   |   |   |   |   |   isHypertensive > 0
## |   |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   |   Cholesterol <= 0.214286: Borderline risk (7.0)
## |   |   |   |   |   |   |   |   Cholesterol > 0.214286
## |   |   |   |   |   |   |   |   |   HDL <= 0.55: Intermediate risk (5.0)
## |   |   |   |   |   |   |   |   |   HDL > 0.55: Low risk (4.0)
## |   |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   |   Systolic <= 0.827273: Intermediate risk (12.0)
## |   |   |   |   |   |   |   |   Systolic > 0.827273: High risk (2.0)
## |   |   |   |   |   Cholesterol > 0.814286
## |   |   |   |   |   |   Age <= 0.410256: Low risk (6.0/1.0)
## |   |   |   |   |   |   Age > 0.410256
## |   |   |   |   |   |   |   Systolic <= 0.581818: Intermediate risk (3.0)
## |   |   |   |   |   |   |   Systolic > 0.581818: High risk (4.0)
## |   |   |   |   isMale > 0
## |   |   |   |   |   Cholesterol <= 0.928571
## |   |   |   |   |   |   isDiabetic <= 0: Intermediate risk (34.0/3.0)
## |   |   |   |   |   |   isDiabetic > 0
## |   |   |   |   |   |   |   HDL <= 0.6875: High risk (10.0/1.0)
## |   |   |   |   |   |   |   HDL > 0.6875
## |   |   |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.257143
## |   |   |   |   |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (2.0)
## |   |   |   |   |   |   |   |   |   |   isHypertensive > 0: Borderline risk (4.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.257143: Intermediate risk (6.0)
## |   |   |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   |   |   Cholesterol <= 0.314286: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   |   Cholesterol > 0.314286: High risk (3.0)
## |   |   |   |   |   Cholesterol > 0.928571
## |   |   |   |   |   |   isHypertensive <= 0: Intermediate risk (2.0)
## |   |   |   |   |   |   isHypertensive > 0: Borderline risk (7.0)
## Age > 0.564103
## |   Systolic <= 0.490909
## |   |   isDiabetic <= 0
## |   |   |   Age <= 0.692308
## |   |   |   |   HDL <= 0.1125
## |   |   |   |   |   isHypertensive <= 0: Low risk (2.0)
## |   |   |   |   |   isHypertensive > 0: High risk (3.0)
## |   |   |   |   HDL > 0.1125
## |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   Systolic <= 0.181818: Low risk (6.0/1.0)
## |   |   |   |   |   |   Systolic > 0.181818
## |   |   |   |   |   |   |   Age <= 0.589744: Intermediate risk (3.0/1.0)
## |   |   |   |   |   |   |   Age > 0.589744: Borderline risk (27.0)
## |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   Systolic <= 0.081818: Borderline risk (4.0)
## |   |   |   |   |   |   Systolic > 0.081818: Intermediate risk (11.0/1.0)
## |   |   |   Age > 0.692308
## |   |   |   |   HDL <= 0.975
## |   |   |   |   |   Age <= 0.769231
## |   |   |   |   |   |   Cholesterol <= 0.057143: Borderline risk (5.0/1.0)
## |   |   |   |   |   |   Cholesterol > 0.057143: Intermediate risk (14.0)
## |   |   |   |   |   Age > 0.769231
## |   |   |   |   |   |   isSmoker <= 0
## |   |   |   |   |   |   |   Systolic <= 0.427273: Intermediate risk (25.0/3.0)
## |   |   |   |   |   |   |   Systolic > 0.427273: High risk (3.0)
## |   |   |   |   |   |   isSmoker > 0
## |   |   |   |   |   |   |   Systolic <= 0.227273
## |   |   |   |   |   |   |   |   isBlack <= 0: Intermediate risk (3.0)
## |   |   |   |   |   |   |   |   isBlack > 0: High risk (3.0/1.0)
## |   |   |   |   |   |   |   Systolic > 0.227273: High risk (10.0)
## |   |   |   |   HDL > 0.975: Borderline risk (5.0)
## |   |   isDiabetic > 0
## |   |   |   isSmoker <= 0
## |   |   |   |   Age <= 0.666667: Intermediate risk (16.0/1.0)
## |   |   |   |   Age > 0.666667
## |   |   |   |   |   isMale <= 0
## |   |   |   |   |   |   Age <= 0.923077
## |   |   |   |   |   |   |   HDL <= 0.6: Intermediate risk (10.0/1.0)
## |   |   |   |   |   |   |   HDL > 0.6: High risk (5.0/1.0)
## |   |   |   |   |   |   Age > 0.923077: High risk (4.0)
## |   |   |   |   |   isMale > 0
## |   |   |   |   |   |   isBlack <= 0: High risk (4.0)
## |   |   |   |   |   |   isBlack > 0
## |   |   |   |   |   |   |   Age <= 0.769231: High risk (4.0)
## |   |   |   |   |   |   |   Age > 0.769231: Intermediate risk (4.0/1.0)
## |   |   |   isSmoker > 0
## |   |   |   |   isHypertensive <= 0
## |   |   |   |   |   isBlack <= 0
## |   |   |   |   |   |   Age <= 0.692308: Intermediate risk (5.0)
## |   |   |   |   |   |   Age > 0.692308: High risk (4.0)
## |   |   |   |   |   isBlack > 0: High risk (6.0)
## |   |   |   |   isHypertensive > 0: High risk (32.0)
## |   Systolic > 0.490909
## |   |   Age <= 0.589744
## |   |   |   isDiabetic <= 0: Borderline risk (4.0/1.0)
## |   |   |   isDiabetic > 0: High risk (7.0/1.0)
## |   |   Age > 0.589744
## |   |   |   isSmoker <= 0
## |   |   |   |   isMale <= 0
## |   |   |   |   |   Age <= 0.666667: Intermediate risk (5.0/1.0)
## |   |   |   |   |   Age > 0.666667
## |   |   |   |   |   |   isDiabetic <= 0
## |   |   |   |   |   |   |   Systolic <= 0.809091: Intermediate risk (5.0/1.0)
## |   |   |   |   |   |   |   Systolic > 0.809091: High risk (7.0)
## |   |   |   |   |   |   isDiabetic > 0: High risk (16.0)
## |   |   |   |   isMale > 0: High risk (37.0/1.0)
## |   |   |   isSmoker > 0: High risk (86.0)
## 
## Number of Leaves  :  153
## 
## Size of the tree :   305
# Plot the J48 decision tree
plot(C45Fit)

# Make predictions using the J48 model on the test data
testPred <- predict(C45Fit, newdata = testData)

# Create a confusion matrix
conf_matrix <- table(testPred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## testPred            Low risk Borderline risk Intermediate risk High risk
##   Low risk                61               0                 4         0
##   Borderline risk          8              91                 5         3
##   Intermediate risk        6               0                50        19
##   High risk                0               1                14        54
# Calculate performance metrics
accuracy_G3 <- sum((diag(conf_matrix)) / sum(conf_matrix))
error_rate_G3 <-( 1 - accuracy_G3)
sensitivity_G3 <- conf_matrix[4, 4] / sum(conf_matrix[4, ])
specificity_G3 <- sum(diag(conf_matrix[-4, -4])) / sum(conf_matrix[-4, ])
precision_G3 <- conf_matrix[4, 4] / sum(conf_matrix[, 4])

accuracy <- sum(testPred == testData$Risk) / length(testPred)
# Display performance metrics
cat("Accuracy: ", accuracy_G3, "\n")
## Accuracy:  0.8101266
cat("Error Rate: ", error_rate_G3, "\n")
## Error Rate:  0.1898734
cat("Sensitivity (Recall): ", sensitivity_G3, "\n")
## Sensitivity (Recall):  0.7826087
cat("Specificity: ", specificity_G3, "\n")
## Specificity:  0.8178138
cat("Precision: ", precision_G3, "\n")
## Precision:  0.7105263

Analysis:

The C4.5 decision tree, employing the gain ratio criterion, demonstrates a commendable accuracy of 81.01%. With a substantial tree size of 305 and 153 leaves, the model captures nuanced relationships within the dataset. Its predictive prowess is evident in the balanced sensitivity (78.26%) and specificity (81.78%), highlighting its ability to correctly identify both positive and negative instances. The precision of 71.05% emphasizes the accuracy of positive predictions. This collectively positions the C4.5 decision tree as a robust and effective choice for classification on our dataset, showcasing its capability to achieve high accuracy and reliable predictions.

After we have created a decision tree using the Gain ratio of three different sizes, we will now calculate the comparison between the three models

# Create data frames for each model's summary
summary_c4.5_1 <- data.frame(
  Model = "60% training, 40% testing",
  Accuracy = 78.38,
  Sensitivity = 73.72,
  Specificity = 79.92,
  Precision = 74.19
)

summary_c4.5_2 <- data.frame(
  Model = "70% training, 30% testing",
  Accuracy = 79.39,
  Sensitivity = 78.0,
  Specificity = 79.78,
  Precision = 72.90
)

summary_c4.5_3 <- data.frame(
  Model = "80% training, 20% testing",
  Accuracy = 81.01,
  Sensitivity = 78.26,
  Specificity = 81.78,
  Precision = 71.05
)

# Combine the summaries into a single data frame
comparison_table <- rbind(summary_c4.5_1, summary_c4.5_2, summary_c4.5_3)

# Print the comparison table
print(comparison_table)
##                       Model Accuracy Sensitivity Specificity Precision
## 1 60% training, 40% testing    78.38       73.72       79.92     74.19
## 2 70% training, 30% testing    79.39       78.00       79.78     72.90
## 3 80% training, 20% testing    81.01       78.26       81.78     71.05

In our exploration of decision tree models—specifically, C4.5 with varying numbers of training-testing —we aimed to identify the optimal configuration for accurate and reliable predictions. The results indicate that the model with (80% training, 20% testing) stands out, achieving the highest accuracy at 81.01%. This particular configuration strikes a balance between sensitivity (78.26%), specificity (81.78%), and precision (71.05%), making it a robust choice for the classification task at hand.

It’s noteworthy that the model with (70% training, 30% testing) also performs well, showcasing competitive accuracy (79.39%) and a balanced trade-off between sensitivity and specificity. However, the model with (60% training, 40% testing) surpasses it, demonstrating superior sensitivity and precision.

In contrast, the model with (60% training, 40% testing), while achieving a respectable accuracy of 78.38%, exhibits slightly lower sensitivity and precision. This suggests that a more complex tree structure, as seen in the model with (80% training, 20% testing), contributes to better capturing the underlying patterns in the data.

In conclusion, the C4.5 decision tree with (80% training, 20% testing) emerges as the preferred model for this specific dataset and classification task. Its superior performance in terms of accuracy, sensitivity, specificity, and precision underscores its suitability for making reliable predictions.

Decision tree using Information gain

-For the construction of our decision tree model, we have opted for the C5.0 algorithm, a sophisticated and versatile tool known for its proficiency in handling classification tasks. Specifically, we harness the power of information gain as the guiding criterion within C5.0. This choice is deliberate, as information gain allows the algorithm to discern the most relevant and discriminative features in our dataset, facilitating the creation of a decision tree that excels at capturing intricate patterns and relationships.

1-partition the data into ( 60% training, 40% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.60 , 0.40))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]
dim(trainData)
## [1] 959  10
dim(testData)
## [1] 629  10
# install.packages("C50")
library(C50)

# Define the formula
myFormula <- Risk ~ .

# Build the C5.0 decision tree on the training data with information gain
c50_model <- C5.0(myFormula, data = trainData)

# Plot the decision tree
plot(c50_model)

# Display a summary of the decision tree
print(c50_model)
## 
## Call:
## C5.0.formula(formula = myFormula, data = trainData)
## 
## Classification Tree
## Number of samples: 959 
## Number of predictors: 9 
## 
## Tree size: 105 
## 
## Non-standard options: attempt to group attributes
# Make predictions using the C5.0 model on the test data
testPred <- predict(c50_model, newdata = testData)

# Create a confusion matrix
conf_matrix <- table(testPred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## testPred            Low risk Borderline risk Intermediate risk High risk
##   Low risk               125               4                 5         1
##   Borderline risk         19             165                28        12
##   Intermediate risk        6               0                85        26
##   High risk                1               7                29       116
# Calculate performance metrics
accuracy_I1 <- sum((diag(conf_matrix)) / sum(conf_matrix))
error_rate_I1 <-( 1 - accuracy_I1)
sensitivity_I1 <- conf_matrix[4, 4] / sum(conf_matrix[4, ])
specificity_I1 <- sum(diag(conf_matrix[-4, -4])) / sum(conf_matrix[-4, ])
precision_I1 <- conf_matrix[4, 4] / sum(conf_matrix[, 4])

# Display performance metrics
cat("Accuracy: ", accuracy_I1, "\n")
## Accuracy:  0.7806041
cat("Error Rate: ", error_rate_I1, "\n")
## Error Rate:  0.2193959
cat("Sensitivity (Recall): ", sensitivity_I1, "\n")
## Sensitivity (Recall):  0.7581699
cat("Specificity: ", specificity_I1, "\n")
## Specificity:  0.7878151
cat("Precision: ", precision_I1, "\n")
## Precision:  0.7483871

Analysis: The C5 model demonstrates strong predictive capabilities with an accuracy of 78.37%. It effectively identifies instances of low risk (sensitivity of 80.6%) and maintains high specificity (77.6%) in recognizing non-low-risk instances. The precision of 72.26% highlights the accuracy of positive predictions. The model’s tree structure, comprising 120 nodes, reflects its complexity in capturing patterns within the data. These results suggest a well-balanced model with the potential for reliable classification across multiple risk categories.

2-partition the data into ( 70% training, 30% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.70 , 0.30))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]
# install.packages("C50")
library(C50)

# Define the formula
myFormula <- Risk ~ .

# Build the C5.0 decision tree on the training data with information gain
c50_model <- C5.0(myFormula, data = trainData)

# Plot the decision tree
plot(c50_model)

# Display a summary of the decision tree
print(c50_model)
## 
## Call:
## C5.0.formula(formula = myFormula, data = trainData)
## 
## Classification Tree
## Number of samples: 1132 
## Number of predictors: 9 
## 
## Tree size: 135 
## 
## Non-standard options: attempt to group attributes
# Make predictions using the C5.0 model on the test data
testPred <- predict(c50_model, newdata = testData)

# Create a confusion matrix
conf_matrix <- table(testPred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## testPred            Low risk Borderline risk Intermediate risk High risk
##   Low risk                97               2                 9         2
##   Borderline risk          7             124                11         4
##   Intermediate risk       12               0                64        22
##   High risk                0               0                23        79
# Calculate performance metrics
accuracy_I2 <- sum((diag(conf_matrix)) / sum(conf_matrix))
error_rate_I2 <-( 1 - accuracy_I2)
sensitivity_I2 <- conf_matrix[4, 4] / sum(conf_matrix[4, ])
specificity_I2 <- sum(diag(conf_matrix[-4, -4])) / sum(conf_matrix[-4, ])
precision_I2 <- conf_matrix[4, 4] / sum(conf_matrix[, 4])

# Display performance metrics
cat("Accuracy: ", accuracy_I2, "\n")
## Accuracy:  0.7982456
cat("Error Rate: ", error_rate_I2, "\n")
## Error Rate:  0.2017544
cat("Sensitivity (Recall): ", sensitivity_I2, "\n")
## Sensitivity (Recall):  0.7745098
cat("Specificity: ", specificity_I2, "\n")
## Specificity:  0.8050847
cat("Precision: ", precision_I2, "\n")
## Precision:  0.7383178

Analysis: The C5 model achieved an accuracy of 78.07%, demonstrating its proficiency in making correct predictions across all classes. It exhibits robust sensitivity (77.23%), effectively identifying instances of high risk. The model’s specificity (78.31%) suggests improved accuracy in correctly identifying non-high-risk instances compared to the previous configuration. The precision of 72.90% reflects the accuracy of positive predictions. The tree structure, comprising 125 nodes, signifies a moderate level of complexity. Overall, the model performs well, with enhanced specificity, showcasing its suitability for this classification task.

3-partition the data into ( 80% training, 20% testing):sting):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.80 , 0.20))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]
# install.packages("C50")
library(C50)

# Define the formula
myFormula <- Risk ~ .

# Build the C5.0 decision tree on the training data with information gain
c50_model <- C5.0(myFormula, data = trainData)

# Plot the decision tree
plot(c50_model)

# Display a summary of the decision tree
print(c50_model)
## 
## Call:
## C5.0.formula(formula = myFormula, data = trainData)
## 
## Classification Tree
## Number of samples: 1272 
## Number of predictors: 9 
## 
## Tree size: 155 
## 
## Non-standard options: attempt to group attributes
# Make predictions using the C5.0 model on the test data
testPred <- predict(c50_model, newdata = testData)

# Create a confusion matrix
conf_matrix <- table(testPred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## testPred            Low risk Borderline risk Intermediate risk High risk
##   Low risk                61               0                 4         0
##   Borderline risk          8              90                 4         3
##   Intermediate risk        6               1                52        16
##   High risk                0               1                13        57
# Calculate performance metrics
accuracy_I3 <- sum((diag(conf_matrix)) / sum(conf_matrix))
error_rate_I3 <-( 1 - accuracy_I3)
sensitivity_I3 <- conf_matrix[4, 4] / sum(conf_matrix[4, ])
specificity_I3 <- sum(diag(conf_matrix[-4, -4])) / sum(conf_matrix[-4, ])
precision_I3 <- conf_matrix[4, 4] / sum(conf_matrix[, 4])

# Display performance metrics
cat("Accuracy: ", accuracy_I3, "\n")
## Accuracy:  0.8227848
cat("Error Rate: ", error_rate_I3, "\n")
## Error Rate:  0.1772152
cat("Sensitivity (Recall): ", sensitivity_I3, "\n")
## Sensitivity (Recall):  0.8028169
cat("Specificity: ", specificity_I3, "\n")
## Specificity:  0.8285714
cat("Precision: ", precision_I3, "\n")   
## Precision:  0.75

Analysis: The C5 model achieved an accuracy of 64.42%, showcasing its ability to make correct predictions across all classes. It exhibits strong sensitivity (73.03%), effectively identifying instances of high risk. However, the model’s specificity (57.98%) suggests potential for improvement in correctly identifying non-high-risk instances. The precision of 84.42% reflects the accuracy of positive predictions. The tree structure, comprising 92 nodes, indicates a moderate level of complexity. While the model performs reasonably well, there may be opportunities for refinement, particularly in specificity. Overall,The model’s strength lies in identifying clear cases (Low and High risk) .

After we have created a decision tree using the Information gain of three different sizes, we will now calculate the comparison between the three models

# Create data frames for each model's summary
summary1 <- data.frame(
  Model = "60%training 40%testing",
  Accuracy = 78.37,
  Sensitivity = 80.6,
  Specificity = 77.6,
  Precision = 72.26
)

summary2 <- data.frame(
  Model = "70%training 30%testing",
  Accuracy =  78.07,
  Sensitivity = 77.23,
  Specificity = 78.31,
  Precision = 72.90
)

summary3 <- data.frame(
  Model = "80%training 20%testing",
  Accuracy = 64.42,
  Sensitivity = 73.03,
  Specificity = 57.98,
  Precision = 84.42
)

# Combine the summaries into a single data frame
comparison_table <- rbind(summary1, summary2, summary3)

# Print the comparison table
print(comparison_table)
##                    Model Accuracy Sensitivity Specificity Precision
## 1 60%training 40%testing    78.37       80.60       77.60     72.26
## 2 70%training 30%testing    78.07       77.23       78.31     72.90
## 3 80%training 20%testing    64.42       73.03       57.98     84.42

Analysis:

  • All three C5 models exhibit distinctive performance metrics. The 60% training 40% testing model achieves an accuracy of 64.90%, with a sensitivity of 75.20% and specificity of 57.63%. The 70% training 30% testing model shows an accuracy of 64.42%, sensitivity of 73.03%, and specificity of 57.98%. The 80% training 20% testing model has an accuracy of 64.42%, sensitivity of 73.03%, and specificity of 57.98%. While accuracies are comparable, the 70% training 30% testing model displays slightly higher sensitivity and precision, making it a potential choice.

Conclusion:

  • Selecting the optimal C5 model depends on specific objectives. If prioritizing sensitivity and precision, the 70% training 30% testing model stands out. However, overall performance improvements could be achieved by refining model parameters or exploring additional features. Further optimization is recommended to enhance specificity across all models.

Decision Tree using Gini Index

Opting for RPART with the Gini index involves building a decision tree that maximizes class separation by minimizing impurity. This method, rooted in recursive partitioning, aims to create nodes that group similar instances based on the Gini impurity criterion.

1-partition the data into ( 60% training, 40% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.60 , 0.40))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]
dim(trainData)
## [1] 959  10
dim(testData)
## [1] 629  10
#train using the trainData and create the rpart gini index tree
library('rpart')
library('rpart.plot')
library(caret)
tree <- rpart(myFormula, data = trainData,method = 'class')
rpart.plot(tree) 

# Make predictions using the RPART model on the test data
test_pred <- predict(tree, newdata = testData, type = "class")

# Create a confusion matrix
conf_matrix_rpart <- table(test_pred, testData$Risk)

# Display the confusion matrix
print(conf_matrix_rpart)
##                    
## test_pred           Low risk Borderline risk Intermediate risk High risk
##   Low risk               107              48                16         4
##   Borderline risk         31              71                28         4
##   Intermediate risk       11              53                65        36
##   High risk                2               4                38       111
# Calculate performance metrics
accuracy_D1 <- sum(diag(conf_matrix)) / sum(conf_matrix)
error_rate_D1 <- 1 - accuracy_D1
sensitivity_D1 <- conf_matrix[2, 2] / sum(conf_matrix[2, ])
specificity_D1 <- sum(diag(conf_matrix[-2, -2])) / sum(conf_matrix[-2, ])
precision_D1 <- conf_matrix[2, 2] / sum(conf_matrix[, 2])


# Display performance metrics
cat("Accuracy: ", accuracy_D1, "\n")
## Accuracy:  0.8227848
cat("Error Rate: ", error_rate_D1, "\n")
## Error Rate:  0.1772152
cat("Sensitivity (Recall): ", sensitivity_D1, "\n")
## Sensitivity (Recall):  0.8571429
cat("Specificity: ", specificity_D1, "\n")
## Specificity:  0.8056872
cat("Precision: ", precision_D1, "\n")
## Precision:  0.9782609

Analysis:

The results obtained from the rpart model showcase a balanced performance across various risk categories. The model achieved an overall accuracy of 57.39%, indicating its ability to make correct predictions across all classes. Sensitivity, measuring the model’s capability to identify positive instances, is at 50.81%, demonstrating a reasonable ability to detect true positives. Specificity stands at 60.14%, indicating the model’s proficiency in correctly identifying negative instances. The precision of 53.41% signifies the accuracy of positive predictions.

2-partition the data into ( 70% training, 30% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.70 , 0.30))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]
#train using the trainData and create the rpart gini index tree
library('rpart')
library('rpart.plot')
tree <- rpart(myFormula, data = trainData,method = 'class')
rpart.plot(tree) 

# Make predictions using the RPART model on the test data
test_pred <- predict(tree, newdata = testData, type = "class")

# Create a confusion matrix
conf_matrix <- table(test_pred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## test_pred           Low risk Borderline risk Intermediate risk High risk
##   Low risk                75              12                15         3
##   Borderline risk         34              81                21         1
##   Intermediate risk        5              31                54        37
##   High risk                2               2                17        66
# Calculate performance metrics
accuracy_D2 <- sum(diag(conf_matrix)) / sum(conf_matrix)
error_rate_D2 <- 1 - accuracy_D2
sensitivity_D2 <- conf_matrix[2, 2] / sum(conf_matrix[2, ])
specificity_D2 <- sum(diag(conf_matrix[-2, -2])) / sum(conf_matrix[-2, ])
precision_D2 <- conf_matrix[2, 2] / sum(conf_matrix[, 2])


# Display performance metrics
cat("Accuracy: ", accuracy_D2, "\n")
## Accuracy:  0.6052632
cat("Error Rate: ", error_rate_D2, "\n")
## Error Rate:  0.3947368
cat("Sensitivity (Recall): ", sensitivity_D2, "\n")
## Sensitivity (Recall):  0.5912409
cat("Specificity: ", specificity_D2, "\n")
## Specificity:  0.6112853
cat("Precision: ", precision_D2, "\n")
## Precision:  0.6428571

Analysis:

The results from the RPART model reveal a well-balanced performance across different risk categories. The model achieved an overall accuracy of 60.31%, indicating its proficiency in making accurate predictions across all classes. Notably, it demonstrated a sensitivity of 55.10%, effectively identifying instances of low risk, and a specificity of 62.78%, accurately recognizing non-low-risk instances. The precision of 64.29% underscores the model’s accuracy in positive predictions.

3-partition the data into ( 80% training, 20% testing):

set.seed(1234)
ind=sample (2, nrow(balanced_data), replace=TRUE, prob=c(0.80 , 0.20))
trainData=balanced_data[ind==1,]
testData=balanced_data[ind==2,]
#train using the trainData and create the rpart gini index tree
library('rpart')
library('rpart.plot')
tree <- rpart(myFormula, data = trainData,method = 'class')
rpart.plot(tree) 

# Make predictions using the RPART model on the test data
test_pred <- predict(tree, newdata = testData, type = "class")

# Create a confusion matrix
conf_matrix <- table(test_pred, testData$Risk)

# Display the confusion matrix
print(conf_matrix)
##                    
## test_pred           Low risk Borderline risk Intermediate risk High risk
##   Low risk                50              28                 9         3
##   Borderline risk         19              41                16         0
##   Intermediate risk        4              22                36        28
##   High risk                2               1                12        45
# Calculate performance metrics
accuracy_D3 <- sum(diag(conf_matrix)) / sum(conf_matrix)
error_rate_D3 <- 1 - accuracy_D3
sensitivity_D3 <- conf_matrix[2, 2] / sum(conf_matrix[2, ])
specificity_D3 <- sum(diag(conf_matrix[-2, -2])) / sum(conf_matrix[-2, ])
precision_D3 <- conf_matrix[2, 2] / sum(conf_matrix[, 2])


# Display performance metrics
cat("Accuracy: ", accuracy_D3, "\n")
## Accuracy:  0.5443038
cat("Error Rate: ", error_rate_D3, "\n")
## Error Rate:  0.4556962
cat("Sensitivity (Recall): ", sensitivity_D3, "\n")
## Sensitivity (Recall):  0.5394737
cat("Specificity: ", specificity_D3, "\n")
## Specificity:  0.5458333
cat("Precision: ", precision_D3, "\n") 
## Precision:  0.4456522

Analysis:

The outcomes of the RPART model showcase a discernible performance across distinct risk categories. The model achieved an overall accuracy of 54.75%, highlighting its capability to make correct predictions across all classes. Specifically, it demonstrated a sensitivity of 51.02%, effectively identifying instances of low risk, and a specificity of 56.42%, accurately recognizing non-low-risk instances. The precision of 54.35% emphasizes the model’s accuracy in positive predictions.

After we have created a decision tree using the Gini index of three different sizes, we will now calculate the comparison between the three models

# Create data frames for each summary
summary1 <- data.frame(
  Model = "60% training 40% testing",
  Accuracy = 57.39,
  Sensitivity = 50.81,
  Specificity = 60.14,
  Precision = 53.41
)

summary2 <- data.frame(
  Model = "70% training, 30% testing",
  Accuracy = 60.31,
  Sensitivity = 55.10,
  Specificity = 62.78,
  Precision = 64.29
)

summary3 <- data.frame(
  Model = " 80% training 20% testing",
  Accuracy = 54.75,
  Sensitivity = 51.02,
  Specificity = 56.42,
  Precision = 54.35
)


# Combine summaries into a single data frame
comparison_table <- rbind(summary1, summary2, summary3)

# Print the comparison table
print(comparison_table)
##                       Model Accuracy Sensitivity Specificity Precision
## 1  60% training 40% testing    57.39       50.81       60.14     53.41
## 2 70% training, 30% testing    60.31       55.10       62.78     64.29
## 3  80% training 20% testing    54.75       51.02       56.42     54.35

Observations:

  • The model trained with 70% of the data for training and 30% for testing exhibits the highest overall performance with the highest accuracy, sensitivity, specificity, and precision.

  • The 60% training and 40% testing model follows closely with competitive metrics across all categories.

  • The 80% training and 20% testing model lags behind in accuracy and precision but maintains moderate performance in sensitivity and specificity.

Conclusion: Considering the three models, the 70% training and 30% testing model stands out as the most effective, striking a balance between accuracy, sensitivity, specificity, and precision. It outperforms the other two models, demonstrating its robustness in handling different proportions of training and testing data

Classification conclusion:

the C4.5 model using Gain ratio emerged as the preferred choice. The C4.5 model exhibited superior predictive performance with a higher accuracy of 81.78% in the (80% training, 20% testing) partitioning , sensitivity, specificity, and precision compared to the other models. The decision to favor C4.5 is grounded in its ability to capture both positive and negative instances effectively, making it well-suited for the dataset characteristics. The model’s strength lies in identifying clear cases (Low and High risk).

7- Clustering

Clustering models are utilized to group data into distinct clusters or groups. In our case, we will apply the k-means clustering algorithm to our dataset and interpret the results, taking into consideration our knowledge of the class label.

Certain factors can impact the efficacy of the final clusters formed when using k-means clustering that we have to be aware. For instance, outliers: Cluster formation is very sensitive to the presence of outliers as that they can pull the cluster towards itself, thus affecting optimal cluster formation. However, we have already addressed this concern in earlier steps.

First we have to remove target class:

cdataset = subset(dataset, select = -c(Risk))

We can now use the rest of the attributes for clustering.

Check our data set type:

The checking is because K-Means algorithm does not work with categorical data.

# 1- view
str(cdataset)
## 'data.frame':    1000 obs. of  9 variables:
##  $ isMale        : int  1 0 0 1 0 0 1 1 0 1 ...
##  $ isBlack       : int  1 0 1 1 0 0 0 0 0 0 ...
##  $ isSmoker      : int  0 0 1 1 1 1 1 1 1 0 ...
##  $ isDiabetic    : int  1 1 1 1 0 0 0 1 0 1 ...
##  $ isHypertensive: int  1 1 1 0 1 1 0 0 1 1 ...
##  $ Age           : num  0.2308 0.7436 0.2564 0.0513 0.6667 ...
##  $ Systolic      : num  0.1 0.7 0.827 0.5 0.4 ...
##  $ Cholesterol   : num  0.729 0.357 0.243 0.514 0.986 ...
##  $ HDL           : num  0.15 0.487 0.487 0.325 0.537 ...

It’s clear that all 9 variables are numeric of type integer so we can start working on it with no issues.

Determining the optimal number of clusters:

library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
cdataset <- scale(cdataset)
fviz_nbclust(cdataset, kmeans, method = "silhouette")+ labs(subtitle = "silhouette method")

According to silhouette method best number of clusters is K = 2 so will test it along with other high points such as k=4 , k=8.

Clustering K= 2:

As we don’t want the clustering algorithm to depend to an arbitrary variable unit, we start by scaling/standardizing the data:
# 2- prepreocessing 
#Data types should be transformed into numeric types before clustering.
cdataset <- scale(cdataset)

K-means:

K-means algorithm is non-deterministic, meaning that the clustering outcome can vary each time the algorithm is executed, even when applied to the same dataset. To address this, we will set a seed for the random number generation, ensuring that the results can be reproduced consistently.

# 3- run k-means clustering to find 2 clusters
#set a seed for random number generation  to make the results reproducible
set.seed(8953)
kmeans.result <- kmeans(cdataset,2)
# print the clusterng result
kmeans.result
## K-means clustering with 2 clusters of sizes 516, 484
## 
## Cluster means:
##        isMale     isBlack   isSmoker  isDiabetic isHypertensive         Age
## 1 -0.02262886 -0.04843516  0.9680116  0.02577952   -0.001627174 -0.02976937
## 2  0.02412499  0.05163749 -1.0320124 -0.02748395    0.001734756  0.03173759
##      Systolic Cholesterol          HDL
## 1  0.04730009 -0.01946460 -0.007645875
## 2 -0.05042737  0.02075152  0.008151387
## 
## Clustering vector:
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
##    2    2    1    1    1    1    1    1    1    2    1    2    2    1    2    1 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
##    2    1    1    2    1    2    1    1    2    2    2    2    1    2    2    2 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
##    2    1    1    2    1    2    1    2    1    1    2    2    2    2    1    1 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
##    2    1    1    1    2    1    1    2    1    2    2    2    1    1    1    2 
##   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
##    1    2    2    1    1    2    1    1    1    1    2    2    2    2    1    1 
##   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
##    1    1    2    2    2    2    2    2    1    2    1    1    2    1    1    1 
##   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112 
##    2    2    1    1    2    1    2    1    1    2    2    2    2    1    1    1 
##  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128 
##    1    1    2    2    2    2    1    1    1    2    1    2    2    1    2    2 
##  129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144 
##    1    1    2    2    1    1    2    2    1    1    2    1    2    2    1    1 
##  145  146  147  148  149  150  151  152  153  154  155  156  157  158  159  160 
##    2    2    1    2    2    1    1    1    1    2    1    1    2    2    2    1 
##  161  162  163  164  165  166  167  168  169  170  171  172  173  174  175  176 
##    2    1    1    1    1    1    2    2    1    1    2    1    2    1    1    1 
##  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192 
##    1    2    2    2    2    2    2    2    1    2    2    2    1    1    2    2 
##  193  194  195  196  197  198  199  200  201  202  203  204  205  206  207  208 
##    2    2    2    1    1    2    2    2    2    2    2    2    1    1    2    1 
##  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224 
##    1    2    1    1    1    1    1    2    2    1    2    1    2    1    1    1 
##  225  226  227  228  229  230  231  232  233  234  235  236  237  238  239  240 
##    1    1    2    2    1    1    2    2    2    2    2    1    1    2    1    1 
##  241  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256 
##    1    2    1    2    2    2    2    1    2    2    2    2    1    1    2    1 
##  257  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272 
##    1    1    2    1    2    1    2    2    2    2    1    2    2    1    1    2 
##  273  274  275  276  277  278  279  280  281  282  283  284  285  286  287  288 
##    2    2    2    1    1    1    1    1    1    2    1    2    2    1    1    2 
##  289  290  291  292  293  294  295  296  297  298  299  300  301  302  303  304 
##    2    2    1    2    1    1    1    1    2    2    2    2    1    1    2    1 
##  305  306  307  308  309  310  311  312  313  314  315  316  317  318  319  320 
##    2    2    1    1    1    1    1    1    2    1    1    1    1    1    1    2 
##  321  322  323  324  325  326  327  328  329  330  331  332  333  334  335  336 
##    1    2    1    2    2    1    1    1    2    2    1    2    2    1    2    2 
##  337  338  339  340  341  342  343  344  345  346  347  348  349  350  351  352 
##    1    1    2    2    2    2    2    2    1    1    2    2    2    1    1    2 
##  353  354  355  356  357  358  359  360  361  362  363  364  365  366  367  368 
##    2    2    1    1    2    2    2    2    2    2    2    2    1    1    2    1 
##  369  370  371  372  373  374  375  376  377  378  379  380  381  382  383  384 
##    2    1    2    2    1    2    2    1    1    2    1    2    2    2    2    1 
##  385  386  387  388  389  390  391  392  393  394  395  396  397  398  399  400 
##    1    2    2    2    1    1    1    2    1    1    1    1    2    1    2    1 
##  401  402  403  404  405  406  407  408  409  410  411  412  413  414  415  416 
##    1    2    1    1    2    2    1    1    1    1    2    2    2    2    1    1 
##  417  418  419  420  421  422  423  424  425  426  427  428  429  430  431  432 
##    1    1    1    1    2    2    1    2    2    1    2    2    2    1    1    2 
##  433  434  435  436  437  438  439  440  441  442  443  444  445  446  447  448 
##    1    2    1    2    2    1    1    1    1    2    1    2    1    1    2    2 
##  449  450  451  452  453  454  455  456  457  458  459  460  461  462  463  464 
##    1    1    2    1    2    1    2    1    2    1    1    2    1    1    1    2 
##  465  466  467  468  469  470  471  472  473  474  475  476  477  478  479  480 
##    1    1    1    1    1    1    1    1    1    1    2    2    2    1    2    1 
##  481  482  483  484  485  486  487  488  489  490  491  492  493  494  495  496 
##    1    2    2    1    1    1    2    1    1    2    1    2    1    1    2    1 
##  497  498  499  500  501  502  503  504  505  506  507  508  509  510  511  512 
##    1    1    2    2    2    2    2    2    2    2    2    2    2    1    1    2 
##  513  514  515  516  517  518  519  520  521  522  523  524  525  526  527  528 
##    1    1    2    1    2    2    1    1    1    1    2    1    2    2    2    2 
##  529  530  531  532  533  534  535  536  537  538  539  540  541  542  543  544 
##    2    2    1    2    1    2    2    1    2    1    1    1    1    1    2    1 
##  545  546  547  548  549  550  551  552  553  554  555  556  557  558  559  560 
##    2    1    2    2    2    2    2    1    1    2    2    1    1    1    1    2 
##  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576 
##    1    1    2    1    2    1    1    2    2    1    1    2    2    2    2    2 
##  577  578  579  580  581  582  583  584  585  586  587  588  589  590  591  592 
##    1    1    2    1    1    2    1    2    2    2    2    1    2    2    1    1 
##  593  594  595  596  597  598  599  600  601  602  603  604  605  606  607  608 
##    2    2    2    1    1    2    2    1    1    1    1    1    2    1    1    1 
##  609  610  611  612  613  614  615  616  617  618  619  620  621  622  623  624 
##    1    2    1    2    1    1    1    2    1    2    1    1    1    2    1    2 
##  625  626  627  628  629  630  631  632  633  634  635  636  637  638  639  640 
##    2    1    1    2    2    2    1    2    2    1    2    1    2    2    2    1 
##  641  642  643  644  645  646  647  648  649  650  651  652  653  654  655  656 
##    1    1    2    2    2    2    1    2    1    2    2    2    1    1    2    1 
##  657  658  659  660  661  662  663  664  665  666  667  668  669  670  671  672 
##    1    1    2    1    2    1    2    1    1    1    1    1    1    1    1    2 
##  673  674  675  676  677  678  679  680  681  682  683  684  685  686  687  688 
##    2    2    2    2    1    1    2    1    1    2    1    2    2    1    1    2 
##  689  690  691  692  693  694  695  696  697  698  699  700  701  702  703  704 
##    1    1    2    1    1    2    1    1    2    1    2    1    2    1    2    2 
##  705  706  707  708  709  710  711  712  713  714  715  716  717  718  719  720 
##    1    2    2    2    2    2    1    1    2    1    2    1    1    2    2    1 
##  721  722  723  724  725  726  727  728  729  730  731  732  733  734  735  736 
##    1    1    2    2    1    1    1    1    1    1    2    2    2    1    1    1 
##  737  738  739  740  741  742  743  744  745  746  747  748  749  750  751  752 
##    1    2    2    2    1    1    1    2    1    2    1    2    2    1    2    1 
##  753  754  755  756  757  758  759  760  761  762  763  764  765  766  767  768 
##    2    2    1    1    1    1    1    1    1    1    2    2    1    1    2    1 
##  769  770  771  772  773  774  775  776  777  778  779  780  781  782  783  784 
##    2    2    2    1    2    1    2    1    2    2    1    2    2    1    2    2 
##  785  786  787  788  789  790  791  792  793  794  795  796  797  798  799  800 
##    2    1    2    1    1    1    1    2    1    2    2    2    1    1    2    1 
##  801  802  803  804  805  806  807  808  809  810  811  812  813  814  815  816 
##    1    2    1    2    2    1    1    2    1    2    2    2    1    2    1    1 
##  817  818  819  820  821  822  823  824  825  826  827  828  829  830  831  832 
##    2    1    1    2    1    1    2    1    1    2    2    2    1    1    2    2 
##  833  834  835  836  837  838  839  840  841  842  843  844  845  846  847  848 
##    1    2    2    1    1    1    2    1    1    2    1    1    2    1    2    2 
##  849  850  851  852  853  854  855  856  857  858  859  860  861  862  863  864 
##    1    2    1    2    2    2    2    1    2    1    2    1    2    2    1    1 
##  865  866  867  868  869  870  871  872  873  874  875  876  877  878  879  880 
##    2    2    2    1    1    1    2    1    1    1    2    2    1    1    1    2 
##  881  882  883  884  885  886  887  888  889  890  891  892  893  894  895  896 
##    1    2    2    1    2    2    2    1    1    2    1    1    2    2    1    1 
##  897  898  899  900  901  902  903  904  905  906  907  908  909  910  911  912 
##    1    1    2    1    1    2    2    1    2    2    2    1    1    2    1    1 
##  913  914  915  916  917  918  919  920  921  922  923  924  925  926  927  928 
##    1    2    2    1    1    1    2    1    2    1    1    1    2    1    1    2 
##  929  930  931  932  933  934  935  936  937  938  939  940  941  942  943  944 
##    2    1    1    1    2    2    2    1    1    1    1    2    1    1    2    2 
##  945  946  947  948  949  950  951  952  953  954  955  956  957  958  959  960 
##    2    2    2    2    1    2    2    2    2    2    1    1    1    1    1    1 
##  961  962  963  964  965  966  967  968  969  970  971  972  973  974  975  976 
##    1    2    1    1    1    1    1    1    1    2    1    1    1    2    1    2 
##  977  978  979  980  981  982  983  984  985  986  987  988  989  990  991  992 
##    1    2    1    2    1    2    2    2    1    2    2    2    1    2    1    1 
##  993  994  995  996  997  998  999 1000 
##    2    2    2    1    2    1    1    2 
## 
## Within cluster sum of squares by cluster:
## [1] 4105.816 3878.629
##  (between_SS / total_SS =  11.2 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

k-means algorithm is used to identify and assign the data to two clusters so that each observation will be assigned to one of the two clusters. From the output, we can observe that two different clusters have been found with sizes 516 and 484, and the within cluster sum of square (WCSS) =11.2% meaning the clusters are kind of compacted. But we need to visualize it to have a better look.

Cluster Plot:

# 4- visualize clustering and install package
library(factoextra)
fviz_cluster(kmeans.result, data = cdataset)

The plot shows overlapping clusters, particularly in the middle, suggesting that some data points are challenging to assign to a specific cluster. An avegrage silhouette coefficient can be more precise so we will calculate it.

Average Silhouette Coefficient:

The value is between [-1, 1], a score of 1 denotes the best. And the worst value is -1. Values near 0 denote overlapping clusters.

#Average silhouette
library(cluster)
avg_sil <- silhouette(kmeans.result$cluster, dist(cdataset))
# k-means clustering with estimating k and initializations
fviz_silhouette(avg_sil)
##   cluster size ave.sil.width
## 1       1  516          0.11
## 2       2  484          0.11

The Average Silhouette Coefficient of 0.11 suggests that there is a certain level of similarity among the data points within the clusters formed through the clustering process. However, the coefficient is relatively low, approaching zero, indicating the presence of overlapping clusters.

BCubed precision and recall:

To measure the quality of the cluster the average BCubed precision and recall of all objects in the data set is computed:

# Cluster assignments and ground truth labels
cluster_assignments <- kmeans.result$cluster
ground_truth <- dataset$Risk

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(cluster_assignments, ground_truth) {
  n <- length(cluster_assignments)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- cluster_assignments[i]
    label <- ground_truth[i]

    # Count the number of items from the same category within the same cluster
    same_category_same_cluster <- sum(ground_truth[cluster_assignments == cluster] == label)

    # Count the total number of items in the same cluster
    total_same_cluster <- sum(cluster_assignments == cluster)

    # Count the total number of items with the same category
    total_same_category <- sum(ground_truth == label)

    # Calculate precision and recall for the current item and add them to the sums
    precision_sum <- precision_sum + same_category_same_cluster / total_same_cluster
    recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }
  precision <- precision_sum / n  # Calculate average precision 
  recall <- recall_sum / n        # Calculate average recall

  return(list(precision = precision, recall = recall)) }

# Calculate BCubed precision and recall
precision_recall <- calculate_bcubed_metrics(cluster_assignments, ground_truth)

# Extract precision and recall from the metrics
precision <- precision_recall$precision
recall <- precision_recall$recall

# Print the results
cat(" BCubed Precision:", precision, "\n","BCubed Recall:", recall)
##  BCubed Precision: 0.3299589 
##  BCubed Recall: 0.5317886

The calculated precision value is 0.32996 not a high value. It means that the clusters are not pure; meaning not all data points in a cluster belong to the same category.

On the other hand, the calculated recall value of 0.53179 implies that approximately half of the objrcts belonging to the same categore are correctly assigned to the same cluster.

Conclusion of K=2:

Considering upove results for K=2 in isolation, without considering our knowledge of the class label, it is evident that the performance is suboptimal (less than ideal). Therefore, it is recommended to explore other values for K in order to achieve better clustering results.

Clustering K= 4:

scaling the data:
# 2- prepreocessing 
#Data types should be transformed into numeric types before clustering.
cdataset <- scale(cdataset)

K-means:

# 1- run k-means clustering to find 4 clusters
#set a seed for random number generation  to make the results reproducible
set.seed(8953)

kmeans_result <- kmeans(cdataset, centers = 4, nstart = 25)

#Accessing kmeans_result
print(kmeans_result)
## K-means clustering with 4 clusters of sizes 240, 255, 244, 261
## 
## Cluster means:
##         isMale      isBlack   isSmoker   isDiabetic isHypertensive         Age
## 1 -0.004998499  0.098461545 -1.0320124 -0.002334427     1.00954535  0.04092810
## 2 -0.101538140 -1.061382078  0.9680116  0.124685876     0.02175491 -0.01063463
## 3  0.052771040  0.005581038 -1.0320124 -0.052221191    -0.98955436  0.02269775
## 4  0.054466405  0.941225616  0.9680116 -0.070853124    -0.02447174 -0.04846424
##      Systolic Cholesterol          HDL
## 1 -0.03065348 -0.08081696 -0.004490818
## 2  0.08003760  0.02566201 -0.052055040
## 3 -0.06987709  0.12065493  0.020586343
## 4  0.01531517 -0.06355382  0.035742390
## 
## Clustering vector:
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
##    1    1    4    4    2    2    2    2    2    1    2    1    1    4    1    2 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
##    1    4    4    1    2    3    2    4    1    1    1    1    2    3    1    1 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
##    1    4    4    1    2    1    2    3    4    2    3    3    3    3    4    2 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
##    3    2    4    4    3    4    2    3    4    3    3    1    4    2    2    1 
##   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
##    4    3    1    4    4    3    2    4    2    4    3    1    3    3    4    4 
##   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
##    4    4    1    1    1    3    3    3    4    1    4    4    3    2    4    2 
##   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112 
##    1    1    2    2    1    2    1    4    4    3    1    3    3    2    2    2 
##  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128 
##    4    2    1    3    3    1    4    2    4    3    4    1    3    4    3    3 
##  129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144 
##    2    4    3    1    4    4    1    3    2    2    1    2    3    1    2    4 
##  145  146  147  148  149  150  151  152  153  154  155  156  157  158  159  160 
##    1    3    2    3    3    2    2    4    4    3    4    2    3    1    1    2 
##  161  162  163  164  165  166  167  168  169  170  171  172  173  174  175  176 
##    1    4    2    4    4    2    1    1    2    4    1    2    3    2    2    2 
##  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192 
##    4    1    1    3    3    1    3    3    4    1    1    1    2    2    1    3 
##  193  194  195  196  197  198  199  200  201  202  203  204  205  206  207  208 
##    3    3    3    4    2    1    1    3    1    3    1    1    2    4    3    4 
##  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224 
##    4    3    2    2    2    4    4    3    1    4    3    4    1    2    4    2 
##  225  226  227  228  229  230  231  232  233  234  235  236  237  238  239  240 
##    2    4    3    1    4    4    3    1    1    1    3    2    4    3    2    2 
##  241  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256 
##    2    3    4    3    3    3    3    4    3    1    1    3    4    2    1    2 
##  257  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272 
##    2    4    3    4    3    2    1    1    1    3    2    3    1    4    2    1 
##  273  274  275  276  277  278  279  280  281  282  283  284  285  286  287  288 
##    1    3    1    4    2    4    4    4    4    1    2    3    3    2    2    3 
##  289  290  291  292  293  294  295  296  297  298  299  300  301  302  303  304 
##    3    1    4    3    4    2    4    2    3    1    1    1    2    4    1    2 
##  305  306  307  308  309  310  311  312  313  314  315  316  317  318  319  320 
##    3    1    2    2    2    2    2    2    1    4    4    2    2    4    4    1 
##  321  322  323  324  325  326  327  328  329  330  331  332  333  334  335  336 
##    2    1    2    1    1    2    2    4    1    3    4    3    3    4    3    1 
##  337  338  339  340  341  342  343  344  345  346  347  348  349  350  351  352 
##    2    2    3    1    3    3    3    1    4    4    3    1    3    2    2    1 
##  353  354  355  356  357  358  359  360  361  362  363  364  365  366  367  368 
##    1    1    2    2    3    1    1    1    1    1    1    3    4    4    3    4 
##  369  370  371  372  373  374  375  376  377  378  379  380  381  382  383  384 
##    1    2    3    1    2    1    3    2    2    3    4    3    1    1    3    2 
##  385  386  387  388  389  390  391  392  393  394  395  396  397  398  399  400 
##    2    1    1    3    4    4    2    3    2    2    2    4    3    2    3    4 
##  401  402  403  404  405  406  407  408  409  410  411  412  413  414  415  416 
##    2    3    4    2    3    1    2    2    4    4    3    3    1    3    2    4 
##  417  418  419  420  421  422  423  424  425  426  427  428  429  430  431  432 
##    2    4    4    2    3    1    2    1    1    2    3    3    1    2    4    3 
##  433  434  435  436  437  438  439  440  441  442  443  444  445  446  447  448 
##    4    1    2    1    1    4    2    2    2    3    2    3    2    4    1    3 
##  449  450  451  452  453  454  455  456  457  458  459  460  461  462  463  464 
##    4    2    3    2    3    4    1    2    1    4    4    1    4    2    4    3 
##  465  466  467  468  469  470  471  472  473  474  475  476  477  478  479  480 
##    2    2    4    2    2    2    4    4    4    4    3    3    3    2    1    2 
##  481  482  483  484  485  486  487  488  489  490  491  492  493  494  495  496 
##    4    3    1    2    2    4    3    4    2    3    4    1    2    2    1    2 
##  497  498  499  500  501  502  503  504  505  506  507  508  509  510  511  512 
##    2    2    1    3    3    3    1    3    3    3    3    3    1    4    2    3 
##  513  514  515  516  517  518  519  520  521  522  523  524  525  526  527  528 
##    4    4    3    2    3    1    2    2    2    4    3    4    3    1    1    1 
##  529  530  531  532  533  534  535  536  537  538  539  540  541  542  543  544 
##    1    3    4    3    4    1    3    2    1    4    4    4    2    4    3    4 
##  545  546  547  548  549  550  551  552  553  554  555  556  557  558  559  560 
##    1    2    3    1    1    1    1    4    2    1    1    4    2    4    4    1 
##  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576 
##    4    2    1    4    1    4    4    1    3    2    4    3    1    3    1    3 
##  577  578  579  580  581  582  583  584  585  586  587  588  589  590  591  592 
##    2    2    1    2    2    1    4    1    1    3    3    4    1    3    4    2 
##  593  594  595  596  597  598  599  600  601  602  603  604  605  606  607  608 
##    3    1    1    4    4    3    1    4    2    4    2    4    3    2    2    2 
##  609  610  611  612  613  614  615  616  617  618  619  620  621  622  623  624 
##    4    1    2    1    2    4    4    1    4    3    2    4    4    3    4    3 
##  625  626  627  628  629  630  631  632  633  634  635  636  637  638  639  640 
##    3    4    2    1    1    1    2    1    1    2    3    2    3    3    3    2 
##  641  642  643  644  645  646  647  648  649  650  651  652  653  654  655  656 
##    2    2    3    3    1    1    2    3    2    3    3    1    4    4    1    4 
##  657  658  659  660  661  662  663  664  665  666  667  668  669  670  671  672 
##    2    4    3    4    3    4    3    4    4    4    2    4    4    4    4    1 
##  673  674  675  676  677  678  679  680  681  682  683  684  685  686  687  688 
##    1    3    1    3    2    4    1    2    4    1    4    3    3    4    2    1 
##  689  690  691  692  693  694  695  696  697  698  699  700  701  702  703  704 
##    2    4    1    2    4    1    4    4    1    2    3    4    3    4    3    3 
##  705  706  707  708  709  710  711  712  713  714  715  716  717  718  719  720 
##    4    3    1    3    1    1    4    4    3    2    3    2    2    3    1    4 
##  721  722  723  724  725  726  727  728  729  730  731  732  733  734  735  736 
##    4    4    1    1    2    2    2    4    4    2    1    3    1    2    4    4 
##  737  738  739  740  741  742  743  744  745  746  747  748  749  750  751  752 
##    2    1    1    3    4    2    4    3    4    3    2    1    1    4    3    2 
##  753  754  755  756  757  758  759  760  761  762  763  764  765  766  767  768 
##    3    1    4    2    2    4    2    4    4    2    1    3    2    2    3    4 
##  769  770  771  772  773  774  775  776  777  778  779  780  781  782  783  784 
##    3    3    3    2    3    4    3    4    1    3    4    1    1    4    3    3 
##  785  786  787  788  789  790  791  792  793  794  795  796  797  798  799  800 
##    3    2    3    4    2    2    4    1    4    3    3    3    2    2    3    4 
##  801  802  803  804  805  806  807  808  809  810  811  812  813  814  815  816 
##    2    1    4    3    3    2    2    3    4    3    3    1    2    1    4    2 
##  817  818  819  820  821  822  823  824  825  826  827  828  829  830  831  832 
##    1    4    2    3    4    4    1    4    4    3    1    1    2    2    3    1 
##  833  834  835  836  837  838  839  840  841  842  843  844  845  846  847  848 
##    2    3    3    2    4    2    3    4    4    1    4    4    3    4    1    1 
##  849  850  851  852  853  854  855  856  857  858  859  860  861  862  863  864 
##    2    1    4    1    3    1    3    2    3    4    3    2    3    3    2    4 
##  865  866  867  868  869  870  871  872  873  874  875  876  877  878  879  880 
##    3    3    1    4    2    2    1    4    2    4    3    3    4    2    4    3 
##  881  882  883  884  885  886  887  888  889  890  891  892  893  894  895  896 
##    4    1    1    2    3    1    1    2    4    3    4    4    1    1    4    2 
##  897  898  899  900  901  902  903  904  905  906  907  908  909  910  911  912 
##    2    4    3    2    2    1    1    4    3    3    1    4    2    3    4    4 
##  913  914  915  916  917  918  919  920  921  922  923  924  925  926  927  928 
##    2    3    1    2    4    2    3    2    1    2    2    2    1    4    2    3 
##  929  930  931  932  933  934  935  936  937  938  939  940  941  942  943  944 
##    1    2    4    2    3    1    3    4    4    2    2    1    4    2    3    1 
##  945  946  947  948  949  950  951  952  953  954  955  956  957  958  959  960 
##    1    3    1    1    4    1    1    1    1    1    4    2    4    4    4    2 
##  961  962  963  964  965  966  967  968  969  970  971  972  973  974  975  976 
##    4    3    2    4    2    2    4    4    2    3    4    4    4    1    2    1 
##  977  978  979  980  981  982  983  984  985  986  987  988  989  990  991  992 
##    4    1    4    3    2    3    1    3    2    3    1    3    4    1    4    2 
##  993  994  995  996  997  998  999 1000 
##    1    1    3    2    3    4    4    3 
## 
## Within cluster sum of squares by cluster:
## [1] 1648.259 1799.286 1739.876 1778.161
##  (between_SS / total_SS =  22.5 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

We can observe that four different clusters have been found with sizes 240 , 255 ,244 and 261. And the within cluster sum of square (WCSS) =22.5% which means that the cluster less compact and cohesive. Its higher than 2 clusters result which means 2 clusters are better in terms of compactness.

Cluster plot :

# 2- visualize clustering and install package
library(factoextra)
fviz_cluster(kmeans_result, data = cdataset)

As we can see In the cluster plot, it’s evident that there are overlapping clusters.

Average Silhouette Coefficient:

#3-Average silhouette
library(cluster)
avg_sil <- silhouette(kmeans_result$cluster, dist(cdataset))
# k-means clustering with estimating k and initializations
fviz_silhouette(avg_sil)
##   cluster size ave.sil.width
## 1       1  240          0.13
## 2       2  255          0.12
## 3       3  244          0.12
## 4       4  261          0.13

An Average Silhouette coefficient of 0.12 indicate that the clustering is not very well-defined, and there is ambiguity and overlap between clusters. However, the result is higher than 2 clusters.

BCubed precision and recall:

# Cluster assignments and ground truth labels
cluster_assignments <- kmeans_result$cluster
ground_truth <- dataset$Risk

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(cluster_assignments, ground_truth) {
  n <- length(cluster_assignments)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- cluster_assignments[i]
    label <- ground_truth[i]

    # Count the number of items from the same category within the same cluster
    same_category_same_cluster <- sum(ground_truth[cluster_assignments == cluster] == label)

    # Count the total number of items in the same cluster
    total_same_cluster <- sum(cluster_assignments == cluster)

    # Count the total number of items with the same category
    total_same_category <- sum(ground_truth == label)

    # Calculate precision and recall for the current item and add them to the sums
    precision_sum <- precision_sum + same_category_same_cluster / total_same_cluster
    recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }
  precision <- precision_sum / n  # Calculate average precision 
  recall <- recall_sum / n        # Calculate average recall

  return(list(precision = precision, recall = recall)) }

# Calculate BCubed precision and recall
precision_recall <- calculate_bcubed_metrics(cluster_assignments, ground_truth)

# Extract precision and recall from the metrics
precision <- precision_recall$precision
recall <- precision_recall$recall

# Print the results
cat(" BCubed Precision:", precision, "\n","BCubed Recall:", recall)
##  BCubed Precision: 0.336335 
##  BCubed Recall: 0.2729542

The calculated precision value is 0.336335 not a high value it mean the clusters are not pure.and not all data points in a cluster belong to the same category.

The calculated recall value is 0.2729542 it’s a low result meaning most of the data are not in the same cluster.

Conclusion of K=4:

After applying various evaluation metrics such as the average silhouette coefficient, within-cluster sum of squares ,Bcubed precision and recall.it became clear to us that k=4 Is not a good number of clusters since there is overlapping and the clusters are not pure .And the within cluster sum of square 4 clusters has a higher value than 2 cluster indicating that the 4 clusters less compact .but According to the number of class label its the best among the considered options.

Clustering K=8 :

scaling the data:
# 2- prepreocessing 
#Data types should be transformed into numeric types before clustering.
cdataset <- scale(cdataset)

K-means:

# 3- run k-means clustering to find 8 clusters
#set a seed for random number generation  to make the results reproducible
set.seed(8953)
kmeansresult <- kmeans(cdataset,8)
# print the clusterng result
kmeansresult
## K-means clustering with 8 clusters of sizes 136, 149, 100, 132, 122, 93, 139, 129
## 
## Cluster means:
##       isMale    isBlack    isSmoker  isDiabetic isHypertensive         Age
## 1  0.6374557  0.9412256  0.96801163 -0.11758451      0.4803719 -0.42674815
## 2  0.9928563  0.1348064 -1.03201240 -0.06416429      0.3789569 -0.26292001
## 3  0.8197539 -0.2403129 -0.09200111 -0.06403000     -0.9895544  0.72578390
## 4 -0.9645589  0.5467726 -0.78958524  0.24399312      0.4189023  0.34795779
## 5 -0.6683239 -0.4868635  0.60735156 -0.12602627     -0.9895544 -0.76525097
## 6 -0.4852307  0.3598234  0.86048345  0.31098443     -0.7316060  0.75221709
## 7 -0.3324182 -0.2401689 -1.03201240 -0.23835629     -0.1410156 -0.05978714
## 8 -0.1272486 -1.0613821  0.96801163  0.14986867      1.0095454  0.08076629
##      Systolic Cholesterol         HDL
## 1  0.06575189 -0.11568304 -0.01941434
## 2 -0.44527267 -0.11897944 -0.04664307
## 3 -0.13781479  0.97280405  0.09080812
## 4 -0.67231955  0.46294127  0.02679507
## 5 -0.42367631 -0.08839685 -0.13312255
## 6  0.50518690 -0.83938009  0.32754430
## 7  1.08076911 -0.37828447 -0.03913655
## 8  0.11170739  0.12791071 -0.09153707
## 
## Clustering vector:
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
##    2    7    1    1    8    8    5    3    8    7    3    7    2    1    2    6 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
##    2    1    1    7    5    3    8    1    2    2    4    2    8    3    7    4 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
##    2    1    6    2    6    2    3    3    6    5    3    7    4    3    5    5 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
##    2    6    1    1    4    5    8    7    6    5    3    4    5    8    8    2 
##   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
##    1    4    2    1    6    2    5    1    8    6    4    4    3    2    1    6 
##   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
##    6    6    4    4    2    2    7    2    5    7    5    6    4    8    1    8 
##   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112 
##    4    2    5    8    7    3    4    1    5    7    7    5    4    5    5    6 
##  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128 
##    6    3    4    7    5    2    6    8    3    7    6    2    2    1    7    2 
##  129  130  131  132  133  134  135  136  137  138  139  140  141  142  143  144 
##    8    4    4    7    3    1    4    4    6    8    2    5    7    2    6    1 
##  145  146  147  148  149  150  151  152  153  154  155  156  157  158  159  160 
##    2    3    8    3    3    5    8    1    1    2    1    8    6    4    7    5 
##  161  162  163  164  165  166  167  168  169  170  171  172  173  174  175  176 
##    2    1    8    1    1    8    4    2    5    6    4    5    7    8    8    5 
##  177  178  179  180  181  182  183  184  185  186  187  188  189  190  191  192 
##    1    2    4    2    3    2    3    7    5    2    4    7    8    8    4    7 
##  193  194  195  196  197  198  199  200  201  202  203  204  205  206  207  208 
##    2    3    7    3    8    2    7    2    7    2    4    2    8    1    4    1 
##  209  210  211  212  213  214  215  216  217  218  219  220  221  222  223  224 
##    6    7    3    8    8    1    6    2    7    1    7    1    4    6    1    8 
##  225  226  227  228  229  230  231  232  233  234  235  236  237  238  239  240 
##    8    3    7    2    1    3    7    4    7    2    7    8    6    7    8    8 
##  241  242  243  244  245  246  247  248  249  250  251  252  253  254  255  256 
##    8    4    1    4    7    4    2    1    3    2    2    7    5    8    4    8 
##  257  258  259  260  261  262  263  264  265  266  267  268  269  270  271  272 
##    5    1    4    3    5    6    4    4    7    5    8    3    7    1    8    7 
##  273  274  275  276  277  278  279  280  281  282  283  284  285  286  287  288 
##    4    7    2    5    8    6    1    5    6    2    8    3    5    5    5    4 
##  289  290  291  292  293  294  295  296  297  298  299  300  301  302  303  304 
##    2    2    6    7    1    8    1    8    7    4    4    7    6    1    2    5 
##  305  306  307  308  309  310  311  312  313  314  315  316  317  318  319  320 
##    7    7    8    5    8    3    5    8    7    6    1    3    8    1    1    4 
##  321  322  323  324  325  326  327  328  329  330  331  332  333  334  335  336 
##    5    2    6    7    2    8    5    4    2    3    6    7    3    5    7    2 
##  337  338  339  340  341  342  343  344  345  346  347  348  349  350  351  352 
##    8    8    4    4    2    2    7    4    1    1    7    4    3    8    3    4 
##  353  354  355  356  357  358  359  360  361  362  363  364  365  366  367  368 
##    2    7    5    8    4    7    4    4    2    2    2    3    1    1    2    4 
##  369  370  371  372  373  374  375  376  377  378  379  380  381  382  383  384 
##    4    3    7    4    8    7    2    5    8    7    1    7    2    2    3    6 
##  385  386  387  388  389  390  391  392  393  394  395  396  397  398  399  400 
##    6    2    4    3    1    1    8    4    8    5    8    1    7    8    2    4 
##  401  402  403  404  405  406  407  408  409  410  411  412  413  414  415  416 
##    8    7    1    8    7    2    3    8    1    6    7    3    7    5    8    1 
##  417  418  419  420  421  422  423  424  425  426  427  428  429  430  431  432 
##    5    1    4    8    2    4    8    4    2    8    7    4    4    5    3    7 
##  433  434  435  436  437  438  439  440  441  442  443  444  445  446  447  448 
##    1    2    8    4    7    5    8    5    8    2    8    3    6    6    4    3 
##  449  450  451  452  453  454  455  456  457  458  459  460  461  462  463  464 
##    3    8    2    8    7    1    4    3    4    1    1    2    6    6    1    2 
##  465  466  467  468  469  470  471  472  473  474  475  476  477  478  479  480 
##    8    8    6    8    6    5    3    6    1    6    7    5    7    5    2    3 
##  481  482  483  484  485  486  487  488  489  490  491  492  493  494  495  496 
##    5    4    7    8    5    1    6    6    8    2    1    2    8    5    2    8 
##  497  498  499  500  501  502  503  504  505  506  507  508  509  510  511  512 
##    8    5    2    4    4    2    2    5    5    2    4    5    7    1    8    5 
##  513  514  515  516  517  518  519  520  521  522  523  524  525  526  527  528 
##    6    6    6    8    3    7    8    6    6    4    5    4    2    7    4    2 
##  529  530  531  532  533  534  535  536  537  538  539  540  541  542  543  544 
##    7    3    1    5    1    7    2    3    4    6    1    1    3    3    7    1 
##  545  546  547  548  549  550  551  552  553  554  555  556  557  558  559  560 
##    2    3    2    7    2    4    4    6    3    7    7    1    8    5    4    7 
##  561  562  563  564  565  566  567  568  569  570  571  572  573  574  575  576 
##    6    5    4    1    2    1    6    2    6    3    6    7    4    4    2    3 
##  577  578  579  580  581  582  583  584  585  586  587  588  589  590  591  592 
##    3    8    7    8    8    4    6    4    2    2    2    4    4    7    6    8 
##  593  594  595  596  597  598  599  600  601  602  603  604  605  606  607  608 
##    4    7    2    4    1    7    4    5    5    3    8    3    3    8    6    8 
##  609  610  611  612  613  614  615  616  617  618  619  620  621  622  623  624 
##    1    2    5    4    3    1    6    2    1    3    8    6    1    3    4    7 
##  625  626  627  628  629  630  631  632  633  634  635  636  637  638  639  640 
##    2    5    6    2    4    7    8    2    7    5    4    6    3    7    3    5 
##  641  642  643  644  645  646  647  648  649  650  651  652  653  654  655  656 
##    8    5    7    7    7    2    5    7    5    2    5    2    6    1    7    1 
##  657  658  659  660  661  662  663  664  665  666  667  668  669  670  671  672 
##    8    1    3    4    3    5    7    1    1    1    5    1    3    1    1    4 
##  673  674  675  676  677  678  679  680  681  682  683  684  685  686  687  688 
##    4    3    4    3    8    5    2    5    1    4    1    7    2    5    6    2 
##  689  690  691  692  693  694  695  696  697  698  699  700  701  702  703  704 
##    3    6    7    6    5    2    6    1    4    8    7    6    7    1    3    7 
##  705  706  707  708  709  710  711  712  713  714  715  716  717  718  719  720 
##    6    7    4    4    2    7    1    1    7    5    7    6    8    5    7    6 
##  721  722  723  724  725  726  727  728  729  730  731  732  733  734  735  736 
##    5    1    4    7    8    8    8    1    1    6    2    2    4    5    1    5 
##  737  738  739  740  741  742  743  744  745  746  747  748  749  750  751  752 
##    8    4    2    4    6    5    1    2    1    2    5    4    2    5    3    8 
##  753  754  755  756  757  758  759  760  761  762  763  764  765  766  767  768 
##    4    2    5    3    8    6    3    6    6    5    2    5    5    8    3    1 
##  769  770  771  772  773  774  775  776  777  778  779  780  781  782  783  784 
##    3    7    3    8    7    6    2    1    4    7    5    2    4    1    3    2 
##  785  786  787  788  789  790  791  792  793  794  795  796  797  798  799  800 
##    3    5    4    5    8    8    3    2    5    4    4    5    5    3    2    1 
##  801  802  803  804  805  806  807  808  809  810  811  812  813  814  815  816 
##    3    2    1    5    7    8    8    2    5    7    2    4    6    2    6    3 
##  817  818  819  820  821  822  823  824  825  826  827  828  829  830  831  832 
##    7    1    3    3    1    4    4    1    1    4    4    7    5    5    5    4 
##  833  834  835  836  837  838  839  840  841  842  843  844  845  846  847  848 
##    5    4    7    8    1    5    5    5    5    2    5    1    3    4    7    2 
##  849  850  851  852  853  854  855  856  857  858  859  860  861  862  863  864 
##    8    2    6    2    4    2    7    3    7    1    5    8    7    7    3    4 
##  865  866  867  868  869  870  871  872  873  874  875  876  877  878  879  880 
##    3    3    4    6    8    5    2    6    5    1    2    7    6    8    1    7 
##  881  882  883  884  885  886  887  888  889  890  891  892  893  894  895  896 
##    1    7    4    8    7    7    2    5    1    7    1    1    2    7    1    5 
##  897  898  899  900  901  902  903  904  905  906  907  908  909  910  911  912 
##    6    4    7    5    8    2    4    1    2    7    2    1    8    3    1    1 
##  913  914  915  916  917  918  919  920  921  922  923  924  925  926  927  928 
##    8    3    2    8    1    3    7    8    7    8    5    5    4    1    8    3 
##  929  930  931  932  933  934  935  936  937  938  939  940  941  942  943  944 
##    2    3    5    8    4    2    4    6    3    8    6    4    1    5    7    2 
##  945  946  947  948  949  950  951  952  953  954  955  956  957  958  959  960 
##    7    7    4    7    1    4    4    7    2    4    1    5    1    1    1    8 
##  961  962  963  964  965  966  967  968  969  970  971  972  973  974  975  976 
##    6    2    3    1    5    8    1    1    5    3    6    1    1    2    5    7 
##  977  978  979  980  981  982  983  984  985  986  987  988  989  990  991  992 
##    6    7    1    3    8    7    2    4    8    7    2    2    6    2    5    8 
##  993  994  995  996  997  998  999 1000 
##    2    2    7    8    6    1    6    7 
## 
## Within cluster sum of squares by cluster:
## [1] 797.7227 949.7918 641.1214 793.4957 737.4096 520.4053 918.1900 761.7654
##  (between_SS / total_SS =  31.9 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

We can observe that the eight different clusters have been found with sizes 136, 149, 100, 132,93, 139 and 129 respectively, and the within cluster sum of square (WCSS) = 31.9%. which is higher than 2 and 4 clusters result which means 2,4 clusters are better in terms of compactness or homogeneity compared to the clustering result of 8 clusters.

Cluster Plot:

# 2- visualize clustering and install package
library(factoextra)
fviz_cluster(kmeansresult, data = cdataset)

It’s clear that the eight clusters are overlapping.

Average Silhouette Coefficient:

#Average silhouette
library(cluster)
avg_sil <- silhouette(kmeansresult$cluster, dist(cdataset))
# k-means clustering with estimating k and initializations
fviz_silhouette(avg_sil)
##   cluster size ave.sil.width
## 1       1  136          0.12
## 2       2  149          0.09
## 3       3  100          0.08
## 4       4  132          0.12
## 5       5  122          0.10
## 6       6   93          0.13
## 7       7  139          0.06
## 8       8  129          0.12

An Average Silhouette Coefficient of 0.1 indicates that, the clusters formed in the clustering process have some degree of similarity among their data points. However, the result is lower than 2 clusters which has silhouette coefficient average of 0.11 and also lower than K=4 clusters that is equal to 0.12.

BCubed precision and recall:

# Cluster assignments and ground truth labels
cluster_assignments <- kmeansresult$cluster
ground_truth <- dataset$Risk

# Function to calculate BCubed precision and recall
calculate_bcubed_metrics <- function(cluster_assignments, ground_truth) {
  n <- length(cluster_assignments)
  precision_sum <- 0
  recall_sum <- 0

  for (i in 1:n) {
    cluster <- cluster_assignments[i]
    label <- ground_truth[i]

    # Count the number of items from the same category within the same cluster
    same_category_same_cluster <- sum(ground_truth[cluster_assignments == cluster] == label)

    # Count the total number of items in the same cluster
    total_same_cluster <- sum(cluster_assignments == cluster)

    # Count the total number of items with the same category
    total_same_category <- sum(ground_truth == label)

    # Calculate precision and recall for the current item and add them to the sums
    precision_sum <- precision_sum + same_category_same_cluster / total_same_cluster
    recall_sum <- recall_sum + same_category_same_cluster / total_same_category
  }
  precision <- precision_sum / n  # Calculate average precision 
  recall <- recall_sum / n        # Calculate average recall

  return(list(precision = precision, recall = recall)) }

# Calculate BCubed precision and recall
precision_recall <- calculate_bcubed_metrics(cluster_assignments, ground_truth)

# Extract precision and recall from the metrics
precision <- precision_recall$precision
recall <- precision_recall$recall

# Print the results
cat(" BCubed Precision:", precision, "\n","BCubed Recall:", recall)
##  BCubed Precision: 0.3747497 
##  BCubed Recall: 0.1554135

The calculated precision value is 0.37478 not a high value it mean the clusters are not pure.

The calculated recall value is 0.15541 it’s a low result meaning most of the data are not in the same cluster.

Conclusion of K=8:

Is not a good number of clusters especially when compared to the results obtained with K=2 and K=4 clusters. This conclusion is based on various evaluation metrics such as the average silhouette coefficient, within-cluster sum of squares, and Bcubed precision and recall. In all aspects, K=8 performed the worst. Additionally, considering the presence of class labels and our prior knowledge of the data set, we know the actual number of groups within the class label. So, by also taking this information into account, we can determine that K=8 is not an optimal number of clusters.

Validation:

library(NbClust)
#a)fviz_nbclust() with silhouette method using library(factoextra) 
fviz_nbclust(cdataset, kmeans, method = "silhouette")+
  labs(subtitle = "Silhouette method")

#b) NbClust validation
fres.nbclust <- NbClust(cdataset, distance="euclidean", min.nc = 2, max.nc = 10, method="kmeans", index="all")
## Warning in log(det(P)/det(W)): NaNs produced

## Warning in log(det(P)/det(W)): NaNs produced

## Warning in log(det(P)/det(W)): NaNs produced

## Warning in log(det(P)/det(W)): NaNs produced

## Warning in log(det(P)/det(W)): NaNs produced

## Warning in log(det(P)/det(W)): NaNs produced

## Warning in log(det(P)/det(W)): NaNs produced
## Warning: did not converge in 10 iterations

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 3 proposed 3 as the best number of clusters 
## * 8 proposed 4 as the best number of clusters 
## * 1 proposed 5 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 2 proposed 9 as the best number of clusters 
## * 2 proposed 10 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  4 
##  
##  
## *******************************************************************

According to the NbClust validation method, which utilizes the majority rule, the best number of clusters is 4. This number contradicts the initial suggestion from the silhouette method, which indicated that the best number of clusters is 2. However, upon revisiting the calculations and evaluating the performance, it is almost accurate to conclude that K=4 indeed performs the best among the considered options.